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Abstract 


This  is  the  Final  Report  of  a  three  year  project  to  “Develop,  Apply,  and  E- 
valuate  Wavelet  Technology”  conducted  under  ARPA  Order  #7092,  AFOSR 
Contract  #F49620-89-C-0125. 

The  Final  Report  has  two  parts.  This  part  contains  a  summary  technical 
report,  and  appendices  that  describe  outreach  activities  to  the  scientific, 
commercial,  and  governmental  communities  under  the  program. 

The  second  part  of  the  Final  Report  is  a  separately  bound  monograph 
titled  The  scalable  structuire  of  information:  An  essay  on  wavelet  technology 
and  its  application  to  bandwidth  management.  It  is  a  semi- technical  presen¬ 
tation  for  people  who  want  to  know  what  wavelets  are,  how  they  are  related 
to  conventionaJ  mathematical  tools,  and  what  they  are  good  for.  The  Essay 
describes  the  foundations  of  wavelet  technology;  provides  an  informal  intro¬ 
duction  (i.e.,  many  figures;  no  proofs)  to  the  many  types  of  compactly  sup¬ 
ported  wavelets;  and  describes  their  application  in  bandwidth  management. 
The  applications  emphasize  compression,  channel  coding,  and  numerical  so¬ 
lution  of  partial  differential  equations. 

Keywords:  Audio  compression;  Bandwidth  compression;  Biorthogonal  wave¬ 
lets;  Channel  coding;  Complex  wavelets;  Connection  coefficients;  Discrete 
functions;  Formal  derivatives  of  wavelets;  Higher  rank  wavelets;  Image  com¬ 
pression;  Numerical  solution  of  partial  differential  equations;  Orthogonal 
wavelets;  Scaling  function;  Transient  signal  analysis;  Wavelet  bases;  Wavelet- 
Capacitance  matrix;  Wavelet-Galerkin  method;  Wavelet  matrices;  Wavelet 
matrix  expansion. 
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Executive  Summary 

This  is  the  Final  Report  of  a  three  year  project  to  “Develop,  Apply,  and  E- 
valuate  Wavelet  Technology”  conducted  under  ARPA  Order  #7092,  AFOSR 
Contract  #F49620-89-C*0125.  The  Final  Report  has  two  parts.  This  part 
contains  a  summary  technical  report,  and  appendices  that  describe  outreach 
activities  to  the  scientific,  commercisd,  and  governmental  communities  under 
the  program. 

The  second  part  of  the  Final  Report  is  a  separately  bound  monograph 
titled  The  scalable  structuire  of  information:  An  essay  on  wavelet  technology 
and  its  application  to  bandwidth  management.  It  is  a  semi-technical  presen¬ 
tation  for  people  who  want  to  know  what  wavelets  are,  how  they  are  related 
to  conventional  mathematical  tools,  and  what  they  are  good  for.  The  Essay 
describes  the  foundations  of  wavelet  technology;  provides  an  informal  intro¬ 
duction  (i.e.,  many  figures;  no  proofs)  to  the  many  types  of  compactly  sup¬ 
ported  wavelets;  and  describes  their  application  in  bandwidth  management. 
The  applications  emphasize  compression,  channel  coding,  and  numerical  so¬ 
lution  of  partial  differential  equations. 
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1  Report  Summary 

In  OUT  work  on  this  contract,  we  completed  work  in  three  major  areas:  the 
mathematical  foundations  of  wavelet  theory,  and  applications  of  wavelets  to 
signal  processing  and  partial  differential  equations.  In  this  fined  technical 
report  we  summarize  the  results  of  our  investigations  in  these  three  areas 
and  report  on  directions  for  further  research. 

In  the  area  of  mathematical  foundations,  we  have  investigated  both  dis¬ 
crete  and  continuous  aspects  of  wavelet  theory.  The  original  wavelets  of 
Daubechies  [2]  and  Mallat  [25]  correspond  to  2-band  filter  banks;  they  split  a 
signal  (or  subspace  of  functions)  into  two  parts.  We  have  developed  [17, 14]  a 
far-reaching  generalization  of  the  wavelet  concept  (“‘higher  rank  wavelets”) 
to  the  case  where  a  signal  is  split  into  m  different  pieces  or  frequency  bands^. 
This  generalization  encompasses  block  transforms  such  as  the  FFT  as  well 
as  overlapped  transforms,  all  within  a  multiresolution  framework.  We  have 
carried  out  much  of  the  detail  work  necessary  to  implement  this  generaliza¬ 
tion  (cf.  [12,  13]).  We  have  also  developed  two  distinct  parametrizations 
of  the  “rank  2”  wavelets,  which  prove  useful  in  adaptive  choice  of  a  wavelet 
for  engineering  applications.  In  the  continuous  realm,  we  have  investigated 
wavelet  expansions  of  function  spaces  -  characterizing  those  wavelets  which 
yield  orthonormal  bases  of  L^(R)  and  describing  the  local  behavior  of  wavelet 
functions,  establishing  connections  with  classical  analysis. 

In  our  investigations  of  the  applications  of  wavelets  to  signal  processing, 
we  have  had  two  major  foci:  image  compression  and  computational  algo¬ 
rithms.  We  have  developed  a  wavelet-based  still  image  compression  system 
[56]  with  performance  superior  to  that  of  traditional  DCT-based  algorithms. 
Gopinath  and  Burrus  [8]  have  studied  computation  with  the  wavelet  trans¬ 
form,  and  Gopinath,  Burrus,  and  Lawton  [11]  have  investigated  the  approx¬ 
imation  of  linear  translation- invariant  operators  with  the  wavelet- Galerkin 
operator.  Furthermore,  we  have  investigated  connections  between  the  wave¬ 
let  and  Fourier  transforms  and  described  the  characteristics  of  some  funda¬ 
mental  signals  in  wavelet  phase  space. 

We  have  applied  wavelet  based  numerical  methods  to  the  solution  of  par¬ 
tial  differential  equations  [6,  18,  19,  33,  34,  35,  54].  Specifically,  we  compare 
the  Wavelet-Galerkin  method  to  standard  numerical  methods  for  the  numer- 


*here  m  is  an  integer  greater  than  or  equal  to  two. 


5 


ical  solution  of 

•  The  Euler  equations  of  a  two-dimensional,  incompressible  fluid  in  a 
periodic  domain. 

•  The  Bihaxmonic  Helmholtz  equation  and  the  Reduced  Wave  equation 
in  nonseparable,  two-dimensional  geometry. 

•  The  Euler  and  Navier-Stokes  equations  in  nonseparable,  two-dimensional 
geometry. 

The  wavelet  methods  have  significant  advantages  with  regard  to  stability, 
accuracy,  and  rate  of  convergence. 

2  Mathematical  Foundations 

2.1  Higher  Rank  Wavelets 

We  have  generalized  the  two-band  wavelets  of  Daubechies  and  Mallat  to  the 
m-band  (or  rank  m)  case,  yielding  a  large  family  of  transforms  which  include 
block  transforms  such  as  the  FFT,  as  well  as  overlapping  generalizations  of 
these  m  X  m  block  transforms.  Furthermore,  these  “higher  rank  wavelets”  are 
well-suited  to  use  in  a  multiresolution  analysis  tree.  They  have  applications 
in  a  broad  range  of  areas  from  image  and  audio  compression  to  interference 
cancellation  and  transient  detection. 

A  wavelet  matrix  is  an  mxmg  matrix  A  with  complex  entries  al  satisfying 


£  =  m6*'*’Si,o  ,  and 

k 

(1) 

53  Ofc  =  . 

(2) 

k 


The  first  row  of  the  wavelet  matrix  is  called  the  lowpass  row  since  it 
corresponds  to  a  lowpass  filter,  and  the  other  rows  are  the  wavelet  or  highpass 
rows.  The  genus  g  is  the  number  of  m  x  m  blocks  which  make  up  the  wavelet 
matrix.  The  conditions  (l)-(2)  can  be  rephrased  in  the  language  of  polyphase 
factorizations  of  filter  banks;  for  details  see  [17]. 

The  wavelet  matrices  of  genus  1  play  a  special  role;  these  m  x  m  matrices 
are  called  Haar  wavelet  matrices;  they  include  a  wide  range  of  block  trans¬ 
forms  such  as  the  FFT,  Discrete  Cosine  Transform,  and  Hadamard- Walsh 
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transform.  Furthermore,  each  wavelet  matrix  (of  arbitrary  genus)  has  such 
a  Haar  matrix  associated  with  it,  by 

=  (3) 

I 

This  Haar  matrix  determines  the  frequency  partitioning  properties  of  the 
corresponding  wavelet  filter  bank. 

Pollen  [27]  at  Aware  has  carried  out  a  parametrization  of  the  rank  2 
wavelets,  describing  a  genus  g  wavelet  by  means  of  y  —  1  angular  parameters. 
We  are  currently  working  on  the  generalization  of  this  result  to  arbitrary 
rcink. 

We  have  observed  [17]  that  the  orthogonality  condition  (1)  implies  that  a 
wavelet  system  provides  an  orthonormal  representation  for  discrete  functions: 

Theorem  2.1  Given  a  signal  f{n)  and  anm  x  mg  wavelet  system  a^,  then 
f  has  a  unique  wavelet  expansion 


m-l 

/(”)  =  E 


JSO 


OP 


/ss«>0P 


where 

C/  =  —  E/(”X»+n  • 

Higher  Rank  Daubechies  Wavelets 

One  of  the  first  questions  to  arise  in  the  theory  of  higher  rank  wavelets 
is  “what  is  the  generalization  of  Daubechies’  regular  compactly  supported 
wavelets  to  the  rank  m  setting?”  Regularity  is  important  because  regular 
filters  “pass  through”  polynomial  information  and  because  regular  filters  lead 
to  regular  (smooth)  scaling  functions.  Given  an  m-band  FIR  wavelet  filter 
(oo,  oi, .  • . ,  as)  with  frequency  response 

A{en  =  J^a,e^'^ 

k 

we  say  that  the  filter  is  regular  of  order  N  iff  the  frequency  response  A  is 
flat  (has  a  zero)  of  order  AT  +  1  at  the  m-th  roots  of  unity  u  =  2nifm. 
There  are  numerous  equivalent  formulations  of  regularity;  they  are  discussed 


Figure  1;  Frequency  response  for  regular  wavelet  filters:  JV  =  1,  m  =  2,3,4 

in  [13,  40].  We  have  constructed  rank  m  wavelet  lowpass  filters  with  .A^-th 
order  regularity  and  derived  an  explicit  formula  for  the  “scaling  coefficients” 
(i.e.  lowpass  filter  taps).  The  key  is  to  begin  with  the  Haar  lowpass  filter 
h  =:(ao  =  1,  ui  :=  1,  . . . ,  Om-i  =  1)  and  form 

By  examining  the  square  modulus  of  A  and  enforcing  the  orthogonality  con> 
dition  (1),  we  find  an  explicit  formula  for  |Q(e“*’)|*  and  thus  A;  the  details 
appear  in  [13].  Figure  1  displays  the  frequency  responses  of  the  N  =  1  regular 
wavelet  filters  for  m  =  2,3,4. 

Having  derived  the  rank  m  wavelet  lowpass  filter  using  the  above  con¬ 
struction,  we  are  then  able  to  construct  a  full  rank  m  wavelet  matrix,  given 
the  lowpass  filter  and  the  desired  characteristic  Haar  matrix.  This  construc¬ 
tion  is  described  in  [17]  and  [12];  it  amounts  to  solving  a  set  of  1  linear 

equations  for  a  wavelet  matrix  of  rank  m  smd  genus  g.  Furthermore,  given  a 
valid  wavelet  matrix  with  regularity  of  order  N.,  we  are  able  to  construct  rank 
m  wavelet  tight  frames  for  L^{fL).  Specifically,  the  rank  m  scaling  function 
is  the  solution  of  the  equation 

-  k) ; 

k 


(4) 
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Basis  elements  phi,  psi_l,  psi_2,  psi_3  with  regularity  N=3 


if  the  scaling  coefficient  set  has  regularity  of  order  N,  then  the  resulting 
scaling  function  will  have  XN  continuous  derivatives,  for  some  constant  A 
independent  of  N.  We  can  then  construct  the  m  —  1  wavelet  functions  by 

0*(x)  =  ^  —  k)  y  s  =  1,  2, . . . ,  m  —  1  .  (5) 

k 

The  collection  of  functions 

form  a  tight  frame  and  in  most  cases  an  orthonormal  basis  for  ^^(R);  details 
are  to  be  found  in  [14,  20].  Figure  2  shows  the  functions  for  a 

rank  4  wavelet  system  with  regularity  order  N  =  3. 

We  have  discovered  [16]  that  when  using  a  regular  wavelet  filter  (of  ar¬ 
bitrary  rank  and  regularity  order  AT),  polynomials  are  generalized  eigenfunc¬ 
tions  for  the  wavelet  transform.  That  is,  a  sequence  h{k)  whose  elements  are 
a  polynomial  in  k  of  degree  less  than  or  equal  to  N  will  emerge  as  a  polyno¬ 
mial  sequence  of  the  same  degree  under  the  operation  of  convolution  with  a 
wavelet  filter  and  decimation  by  a  factor  of  m.  This  property  characterizes 
regular  wavelet  lowpc..3s  filters,  and  should  prove  useful,  for  example  in  the 
detrending  of  signals  with  an  underlying  polynomial  trend.  When  using  a  full 
wavelet  filter  bank,  a  purely  polynomial  input  will  produce  nonzero  outputs 
only  for  the  lowpass  filter. 
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We  have  also  made  significant  steps  [24]  in  the  development  of  wavelet 
theory  for  dimensions  greater  than  one.  In  this  setting  the  sampling  sublattice 
becomes  paramount;  in  the  classical  one  dimensional  rank  2  case,  the  sam¬ 
pling  sublattice  is  the  even  integers  2Z  C  Z.  While  Mallat  [25]  introduced 
a  separable  wavelet  theory  in  multiple  dimensions  (i.e.,  multidimensional 
wavelets  as  tensor  products  of  one- dimensional  wavelets),  the  fully  general 
theory  outlined  in  [24]  permits  the  use  of  a  nonseparable  subsampling  ma¬ 
trix  M.  Specifically,  a  general  n-dimensional  wavelet  system  is  given  by  an 
n-dimensional  lattice  A  C  R**,  a  matrix  M  with  integer  entries  and  a  set  of 
coefficients  ax  satisfying: 

•  MAC  A 

•  M  is  expansive  (all  eigenvalues  strictly  greater  than  one) 

•  Eaga  MS*'*' 6 , 

•  Ea  flA  =  det  . 

The  scaling  and  wavelet  functions  will  then  be  given  by  appropriate  gen¬ 
eralizations  of  the  formulas  (4)  and  (5).  One  can  seek  regular  wavelets  as 
we  did  above  for  the  one  dimensional  rank  m  case;  preliminary  work  on  this 
problem  is  reported  in  [30].  Figure  3  displays  a  nonseparable  two-dimensional 
wavelet  (the  “Novon”  multiplier)  with  regularity  order  N  — 

2.2  Parametrization  of  Wavelets 

Parametrization  of  compactly  supported  wavelets  is  essential  for  effective 
choice  of  a  wavelet  basis  in  a  particular  application.  We  have  arrived  at  t- 
wo  independent  paxametrizations  of  compactly  supported  wavelets,  that  of 
Pollen  [27]  cited  earlier  and  one  due  to  Wells  [55].  We  summarize  Wells’ 
parametrization  of  regular  compactly  supported  wavelets  here.  Given  a  scal¬ 
ing  coefficient  set  {a*}  with  frequency  response  i4(c’"),  Daubechies  describes 
the  scaling  coefficients  with  regularity  of  order  N  as  those  for  which 

A(c'")  =  (1-1-  e^)^Q(e’^) . 

She  solves  for  Q  by  changing  variables  to  y  =  sin^(u;/2),  examining 

M(c-)p 


Figure  3:  Novon  scaling  function  with  regulaxity  order  N  =  1 
and  finding  that 

I0(e“)p  =  Z  (jv  :  1  +  y^Piv)  ■  (6) 

fc=o  '  ' 

subject  to  several  constraints  on  the  polynomial  P.  In  particular, 

P(l/2  —  y)  must  be  an  odd  polynomial  (7) 

and 

E  (iv  1 1  ■^  *)  »* + y^PM  ^  “•  (*) 

k=0  '  ' 

Wells  takes  this  description  as  a  pau-ametrization,  called  the  reduced  moduli 
space,  of  all  wavelet  scaling  coefficients  sets  with  regularity  of  order  N.  For 
each  regularity  order  N  and  degree  M  of  the  polynomial  P  (i.e.  M  free 
parameters  to  the  wavelet)  he  describes  a  polyhedron  in  parameter  space, 
each  point  of  which  gives  rise  to  a  valid  polynomial  P  satisfying  (7)  and 
(8).  Furthermore,  the  vertices  of  the  polyhedron  lie  on  the  boundary  of  the 
reduced  moduli  space,  so  that  in  some  sense  this  polyhedron  is  maximal.  The 
signal  processor  can  then  optimize  over  the  parameter  space  for  an  additional 
desired  trait  (such  as  the  stopband  performance  of  the  wavelet  lowpass  filter). 
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2.3  Wavelets  and  Classical  Analysis 

Wavelets  and  Weierstrass  functions 

Resnikoff  [38]  at  Aware  has  discovered  a  link  between  the  compactly  sup¬ 
ported  wavelets  of  Daubechies  and  Weierstrass’  famous  example  of  a  nowhere 
differentiable  continuous  function.  In  pju'ticular,  he  has  found  that  the  s- 
caling  function  ^  can  be  expressed  as  an  infinite  series  of  harmonics  of  a 
Weierstrass  function.  Weierstrass’  original  example  was  the  infinite  series 

00 

a*  cos  b'‘vx  ; 

k=0 

we  generalize  this  to  allow  the  parameter  a  to  be  an  n  x  n  matrix  and  define 
the  Weierstrass  function  with  matrix  parameter  a  to  be 

W[a,6]  :=  .  (9) 

k=Q 

We  now  break  the  scaling  function  (p  of  equation  (4)  into  pieces  defined  on 
the  unit  interval  (0, 1); 


^j{x)  ;=  (p{j  +  x)x(o.i](af) ;  (10) 

here  X{o,i](2^)  denotes  the  characteristic  function  of  the  unit  interval.  Resnikof- 
f  [38]  proves  that  the  vector-valued  function  $(x)  =  {^>(x)}^j  can  be  rep¬ 
resented  as  a  series  of  “Weierstrass  harmonics”: 


^x)  =  Co+'£C,W[T,2]{qx)  (11) 

9  odd 


where  the  matrix 


T  :=  (Tj,)  =  ( 


<^2}-k  02j+l-k 

2 


) 


is  derived  from  the  scaling  coefficients  a/t  and  the  C,  can  be  determined  from 
<p.  Furthermore,  Resnikoff  has  discovered  that  if  the  scaling  function  tp  is 
regular  of  order  N,  (e,g.  the  wavelet  moments  vanish  up  through  order  N), 
then  each  of  the  functions  is  a  scalar  multiple  of  the  Bernoulli  polynomial 
of  degree  j.  A  detailed  discussion  and  proofs  appear  in  Resnikoff  [38]. 


*1 
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Daubechies  Scaling  Function  on  [0,3) 

Pollen  [29]  has  used  the  first  principles  of  analysis  to  investigate  the  s- 
caling  function  v’dh  associated  with  Daubechies  4-element  scaling  coehBcient 
set  “D4” .  He  is  able  to  prove  the  foUowing  results: 

•  (^£)4  takes  its  v2Jues  in  Q[\/3]> 

•  v^£)4  is  continuous. 

•  vsxj4  is  not  right-differentiable  at  any  dyadic  integer  ^  in  [0,3). 

•  ^D4  is  left-differentiable  at  every  dyadic  integer  in  (0,3). 

2.4  Wavelet  Frames  and  the  Wavelet-Galerkin  Oper¬ 
ator 

Lawton  at  Aware  has  investigated  [20, 21, 22]  the  properties  of  rank  2  wavelet 
expansions  of  the  function  space  i*(R).  He  has  been  able  to  prove  that  every 
compactly  supported  wavelet  gives  rise  to  a  “tight  frame”  for  X*(R),  and  he 
has  also  characterized  the  exceptional  set  of  those  wavelets  which  yield  a 
tight  frame  but  not  an  orthonormal  basis.  Lawtons’s  approach  keys  on  the 
wavelet- Galerkin  operator,  defined  as  follows: 

Given  a  valid  wavelet  scaling  sequence  a,  form  the  operator  Sa  on  se¬ 
quences  in  P  by 


Sa{h){k)  =  2'^amanh{2k  +  m-n)  .  (12) 

m 

The  quadratic  orthogonality  condition  (1)  means  that  the  Dirac  sequence 
^(A:)  is  an  eigenvector  for  Sa  with  eigenvalue  1.  Denote  by  W  the  subset  con¬ 
sisting  of  those  scaling  sequences  for  which  Sa  hzis  more  than  one  eigenvector 
with  eigenvalue  1.  Use  to  denote  the  set  of  scaling  sequences  of  length 
2N  and  define  Wn  =  Wr\  In  [20]  Lawton  proves  the  following 

Theorem  2.2  For  N  >  1,  let  a  €  Vs,  Itt  and  ij)  be  the  scaling  function 
and  wavelet  constructed  from  a,  and  let  Sa  be  as  above.  Then  the  set  of 


’A  dyadic  integer  is  a  rational  number  x  such  that  there  exists  AT,  g  Z  with  €  Z. 
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(normalized)  dyadic  scalings  of  integer  translates  of  ^  forms  a  tight  frame 
for  L^(R),  i.c.  for  any  g  € 

9  -  • 

i,k 

Furthermore,  for  N  >  2,  Ws  is  a  nonempty  proper  algebraic  subset  of  Vs 
and  the  tight  frame  forms  an  orthonormal  basis  unless  a  €  IVjv. 

In  [21],  Lawton  has  been  able  to  turn  this  result  into  a  characterization 
of  those  wavelet  bases  which  are  orthonormal  bases; 

Theorem  2.3  Let  a  Q.  Vs-  Then  the  tight  frame  of  wavelets  constructed 
using  a  is  not  an  orthonormal  basis  if  and  only  if  a  €  Ws,  i.c.  iff  Sa  has  1 
as  an  eigenvalue  with  multiplicity  greater  than  1. 


2.5  Future  Directions 

The  theory  of  higher  rank  wavelets  presented  above  provides  a  high-level, 
unified  framework  encompassing  many  of  the  transforms  used  in  signal  pro¬ 
cessing  today.  Promising  research  directions  exist  in  this  area:  parametriza- 
tion  of  the  family  of  higher  rank  wavelets,  the  theory  of  biorthogonal  rank 
m  wavelets  (the  rank  2  theory  was  introduced  in  [1]),  and  the  application  of 
higher  rank  wavelet  transforms  to  specific  application  areas,  such  as  image 
processing  and  sonar  transient  detection. 
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Figure  4:  Aware’s  wavelet-baaed  image  compression  system 

3  Signal  Processing 

3.1  Image  Processing 

In  the  course  of  its  work  on  the  contract,  Aware  has  developed  a  state-of- 
the-art  wavelet-based  still  image  compression  system.  We  outline  the  design 
of  the  system  and  its  performance  below;  greater  detail  appears  in  [56]  and 

[37). 

Our  system  for  (lossy)  compression  of  images  is  built  of  three  components: 
a  (wavelet)  transform,  a  (lossy)  quantizer,  and  a  (lossless)  encoder,  depicted 
in  Figure  4. 

The  transform  is  a  rank  2  wavelet  transform,  which  operates  on  a  discrete 
data  stream  by  convolution  and  decimation.  In  particular,  we  have  two  filters, 
a  lowpass  filter  {aj^}  and  a  highpass  filter  {bk}  with 

h  =  (— l)*aA^-fc 

and  from  a  one-dimensional  input  signal  /(n)  we  obtun  the  two  outputs 

Lf{n)  =  (/  ♦  a)(2n)  (13) 


and 


/f/(n)  =  (/*6)(2n). 


(14) 
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In  order  to  apply  this  one-dimensional  transform  to  a  two-dimensional  ar¬ 
ray  of  pixels  /  representing  an  image,  we  apply  the  1-d  transform  first  in 
the  X  direction  to  produce  two  output  images  Lgl  and  and  then  ap¬ 
ply  the  1-d  transform  in  the  y  direction  to  these  to  obtain  4  output  images, 
LgLyl,  LxHyl,  HxLyl,  HxHyl.  Wc  theu  iterate  this  procedure  on  the  subim¬ 
age  LxLyl  in  a  Mallat  tree  structure.  ®  This  iterative  decomposition  of  an 
image  is  depicted  in  Figure  5.  Observe  that  because  of  the  critical  decima¬ 
tion,  the  output  of  the  transform  has  exactly  as  many  data  points  as  the 
input. 

The  next  step  is  to  quantize  the  values  of  the  transform  coefficients;  we 
experimented  with  a  number  of  scalar  and  vector  quantizers,  and  ultimately 
found  the  most  effective  method  to  be  a  uniform  scalar  quantizer  with  dif¬ 
ferent  binwidths  for  the  different  output  bands.  One  reason  for  this  is  that 
the  high-frequency  components  (such  as  the  HxHyl  output)  are  concentrated 
aroimd  edges,  and  image  intensities  at  edge  discontinuities  are  known  to  be 
approximately  Laplacian-distributed.  We  can  incorporate  these  Laplacian 
distributions  into  computations  for  a  uniform  scalar  quantizer  rather  easily, 
as  described  in  [7]. 

Once  the  transform  coefficients  have  been  quantized,  we  further  exploit 

^We  have  investigated  alternative  methods  of  iterating  the  transform,  e.g.  further 
decomposing  the  LgHyJ  term  in  z;  details  are  presented  in  [37]. 
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redundancies  in  the  data  by  lossless  encoding,  such  as  Huffman  coding  or  an 
adaptive  coder  such  as  the  Q-coder  developed  at  IBM  [32].  We  have  obtained 
our  best  results  with  the  (nonadaptive)  Huffman  coder. 

Upon  implementing  this  wavelet  image  compression  system,  we  have 
foimd  it  to  exhibit  several  advantages  over  Fourier-based  methods.  First, 
since  we  used  compactly  supported  wavelet  filters  (cf.  [2]),  edges  contribute 
to  a  small  number  of  transform  coefficients.  This  greatly  reduces  the  "ring¬ 
ing”  associated  with  Fourier  methods,  where  a  discontinuity  has  a  broadband 
spectrum,  and  quantization  can  produce  artifacts  across  the  support  of  the 
Fourier  components.  Fourier  methods  such  as  the  DOT  require  that  the  im¬ 
age  be  broken  into  subblocks  (usuaUy  8  x  8  or  16  x  16),  both  lo  bound  the 
0{N  log  N)  complexity  and  to  limit  the  ringing  just  described.  This  produces 
quilt-like  artifacts  once  the  transform  coefficients  are  quantized,  as  differen- 
t  subblocks  end  up  with  different  total  intensities.  A  significant  advantage 
provided  by  our  system  is  that  no  subbblocking  of  the  image  is  necessary 
(the  wavelet  transform  has  0{N)  complexity),  and  these  blocking  artifacts 
are  eliminated.  Finally,  the  local  nature  of  the  computations  (13)  and  (14) 
enables  efficient  hardware  layouts  for  wavelet  transform  computations. 

3.2  Signal  Detection  and  Classification 

Workers  at  Aware  have  investigated  numerous  areas  of  signal  processing  with 
wavelets,  including  an  analysis  of  wavelet  transform  computations  [8],  a  study 
of  wavelets  and  linear  translation- invariant  operators  [11],  an  analysis  of  the 
effect  of  multirate  filtering  on  convolution  [15],  discovery  of  new  relationships 
between  the  wavelet  and  Fourier  transforms  [39],  and  an  investigation  of 
wavelet  phase  space  [36]. 

Gopinath  and  Burrus  [8]  have  developed  efficient  means  of  computing 
the  continuous  wavelet  transform  for  an  arbitrary  wavelet  by  using  the  dis¬ 
crete  wavelet  transform  as  an  intermediary.  Define  the  continuous  wavelet 
transform  of  a  function  /  with  respect  to  a  wavelet  w  by 

r)  =  -  r)dt 

and  write  the  coefficients  of  the  discrete  wavelet  transform  of  a  function  g 
with  respect  to  the  wavelet  tj;  (as  in  (22)  as 
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Then  Gopinath  and  Burrus  find  that 

(15) 

In  other  words,  a  general  continuous  wavelet  transform  may  be  computed  as 
a  double  summation  of  the  discrete  wavelet  transform  of  /  and  w  against  a 
kernel  which  is  the  continuous  wavelet  transform  of  the  wavelet  ^  with  itself. 
Notice  that  the  kernel  can  be  precomputed  once  (as  may  be  the  DWT  of 
the  wavelet  w,  once  it  is  known),  and  so  the  continuous  wavelet  transform 
computation  is  reduced  to  a  discrete  wavelet  transform  computation,  which 
may  be  done  efficiently  by  lattice  methods  such  as  in  [47]. 

Linear  translation- invariant  operators  (such  as  differentiation)  can  be 
well-represented  by  the  wavelet-Galerkin  method;  this  is  developed  in  Gopinath, 
Lawton,  and  Burrus  [llj.  Consider  a  linear  translation- invariant  operator  T 
(e.g.  T  =  d^jdx^)',  it  can  be  represented  as  convolution  with  a  kernel  t(z). 

In  the  case  of  the  Galerkin  method  we  approximate  the  operator  T  defined 
on  L*(R)  by  an  operator  Tj  which  is  the  projection  of  T  onto  a  subspace  Va, 
associated  with  a  meshsize  Az:  Ti  =  where  Pax  is  the  projection 

onto  Va*  =  Span  |yi/Az(/?(z/Az  —  A:)j.  Thus  we  obtain  an  expansion 

P Ax/  “  fAr,k^Ax,k 

k 


and 


9^kx,k  =  f^x,k  *  f  Ax 


(16) 


where 


fAx,fc  —  (^Ax,0»  P^^AxJk)  " 


Thus  the  wavelet-Galerkin  discretization  of  T  acts  as  a  discrete  convolution 
on  the  expansion  coefficients  of  f^.  It  is  proved  in  [11]  that 


•  If  a  function  /  can  be  well-approximated  locally  by  polynomials  of 
degree  less  than  or  equal  to  n,  then  the  action  of  the  operator  T  is 
well-approximated  by  convolution  of  the  samples  of  /  with  the  kernel 
tk  that  acts  on  the  expansion  coefficients. 


•  If  the  wavelets  being  used  in  this  approximation  procedure  have  mo¬ 
ments  vanishing  to  order  M,  then  differentiation  of  order  p  <  2m  is 
well  approximated  by  the  wavelet-Galerkin  discretization 
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Heller  [15]  has  investigated  the  effect  of  multirate  hitting  (convolution 
followed  by  decimation)  upon  hnear  convolution.  As  the  core  operation  of 
lineax  time-invariant  systems,  convolution  is  perhaps  the  most  significant 
operation  of  signal  processing.  The  Fourier  transform  is  important  precisely 
because  it  diagonalizes  convolution,  and  yet  in  doing  so  it  loses  all  time- 
domain  information.  Multirate  filter  banks  provide  both  time  and  frequency 
information  in  their  output,  yet  lose  the  precision  of  the  Fourier  transform  in 
dealing  with  convolution.  Heller’s  work  in  [15]  is  a  first  step  in  understanding 
the  effect  of  convolution  and  decimation  on  linearly  convolved  inputs. 

Resnikoff  and  Burrus  [39]  have  discovered  a  set  of  formulae  relating  the 
Fourier  expansion  of  a  periodic  signal  with  its  wavelet  expansion.  Suppose 
that  g(i)  is  a  periodic  function  on  the  real  line  (i.e.  =  g{t  1))  whose 

Fourier  expansion  is 

9(<)=  f:  6(n)e“”' 

oo 

with  coefficients 

6(n)=  [' g{t)e-^^*^*dt , 

Jo 

Analogously,  the  wavelet  expansion  of  g  is 

9(1)  =  c{l)^i{t)  +  f^  £  (17) 

/=— 00  j=0*=— oo 


feO)  =  V'^rliVi  -  k) 

and  the  family  {v’/jV’j.fc}  form  an  orthonormal  basis  for  X'*(R)  as  described 
in  [21]. 

Resnikoff  and  Burrus  use  the  properties  of  the  scaling  function  (f  and 
wavelet  iir  combination  with  the  periodicity  of  g  to  derive  the  facts  that 


c(l)  =  c(0)  =  c  for  all  / . 


They  then  express  the  Fourier  coefiQcients  6(n)  in  terms  of  the  wavelet  coef¬ 
ficients  and  the  Fourier  transform  of  the  wavelet  0: 


if  n  =  0 
if  n^O  ■ 


(18) 
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Conversely,  they  find  that  the  wavelet  coefficients  of  g  in  terms  of  the 
Fourier  coefficients: 

*)  =  .  “d  (19) 

—00 

c(/)  =  c  =  6(0) .  (20) 

ResnikofF  [36]  has  developed  and  investigated  the  notion  of  wavelet  phase 
space  -  the  space  of  wavelet  coefficients  which  provides  a  time-frequency 
picture  of  a  function.  In  particular,  suppose  we  are  given  a  rank  2  wavelet 
system  {ajt}  the  associated  scaling  function  ip  and  wavelet  0,  so  that  an 

ftmction  / (x)  has  the  expansion 

/(2:)  =  XI  +  51  'll  Cjkrlfjk{x)  .  (21) 

<€Z  j€Z4  keZ 

Alternatively,  we  can  omit  the  scaling  function  part  and  use  the  full  wavelet 
expansion 

/(^)  =  Cjk^Mx)  •  (22) 

i€Zfc€Z 

Whereas  a  Fourier  transform  analyzes  a  signal  in  terms  of  its  frequencies, 
a  wavelet  transform  breaks  the  information  content  of  a  signal  into  scalt 
and  time  components.  The  index  j  determines  the  scale  while  the  index 
k  represents  time  (or  spatial)  translation  within  that  scale.  The  collection 
of  wavelets  orthonormal  basis  for  the  information  (or 

detail)  at  scale  j;  The  detail  over  all  scales  j  with  j  <  0  is  amalgamated  into 
the  set  of  scaling  functions  fpi{x)  =  *p{x  —  /)  in  the  expansion  (21). 

We  can  now  plot  the  magnitudes  of  the  coefficients  Cjk,  with  the  x  axis 
as  the  k  or  time  index  and  the  y  axis  as  the  j  or  scale  index,  providing  a 
time-scale  representation  of  the  energy  in  the  signal  /.  Several  examples 
are  worked  out  in  detail  in  [36];  we  describe  one  of  them,  the  Dirac  delta 
function,  here. 

The  Dirac  distribution  has  the  full  wavelet  expansion 

^(a:)  =  SS  2^/V(-fc)V'>k(x) ; 

ieZ  fceZ 

the  magnitudes  of  these  coefficients  for  a  fixed  basis  {^jk}  &re  shown  in  figure 
6.  The  magnitudes  of  the  coefficients  grow  exponentially  with  j  even  as  the 


location 

Figure  6:  Wavelet  phase  space  plot  of  the  Dirac  distribution 

support  width  in  k  is  shrinking.  A  finite  approximation  of  the  graph  of  the 
Dirac  distribution  is  shown  above  the  phase  space  plot,  while  a  plot  of  the 
magnitude  of  the  Dirac’s  Fourier  transform  is  shown  at  the  right.  Notice  that 
the  wavelet  phase  space  representation  accurately  describes  the  location  of 
the  Dirac,  which  the  Fourier  representation  is  unable  to  do. 

4  Wavelets  and  the  Numerical  Solution  of 
Partial  Differential  Equations 

We  have  applied  wavelet  based  numerical  methods  to  the  solution  of  partial 
differential  equations  [6,  18, 19,  33,  34,  35,  54].  Specifically,  we  compare  the 
Wavelet-Galerkin  method  to  standard  numerical  methods  for  the  numerical 
solution  of 
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•  The  Euler  equations  of  a  two-dimensional,  incompressible  fluid  in  a 
periodic  domain. 

•  The  Bih2irmonic  Helmholtz  equation  and  the  Reduced  Wave  equation 
in  nonseparable,  two-dimensional  geometry. 

•  The  Euler  and  Navier-Stokes  equations  in  nonseparable,  two-dimensional 
geometry. 

The  wavelet  methods  have  significant  advantages  with  regard  to  stability, 
accuracy,  and  rate  of  convergence. 

We  numerically  resolve  the  two-dimensional  Euler  equations  and  study 
the  phenomena  of  two-dimensional  turbulence.  The  compactly  supported 
wavelets  of  Daubechies  provide  an  orthogonal  basis  for  the  square  integrable 
functions  on  the  line  or  circle.  These  have  severaJ  advantages  for  the  nu¬ 
merical  approximation  of  solutions  of  differential  equations,  including  exact 
representation  of  polynomials  of  certain  degrees  and  compact-support.  For 
nonlinear  systems  (Euler  equations)  with  solutions  that  may  develop  sharp 
gradients  the  primary  advantage  seems  to  be  that  wavelets  can  accurately 
approximate  the  smooth  component  to  a  solution  while  correctly  resolving 
the  components  associated  with  strong  gradients  [54].  The  reason  for  this  is 
that  wavelets  are  less  smooth  than  their  order  of  approximation  and  therefore 
are  less  stiff  than  other  higher  order  methods,  i.e.  Fourier  or  spline  bases. 
For  instance,  the  six  term  Daubechies  scaling  function  (D6)  can  exactly  rep¬ 
resent  polynomials  through  the  second  degree.  However,  the  actual  scaling 
function  has  only  a  continuous  1.06  derivative. 

We  apply  the  Daubechies  scaling  function  (translates  of)  with  the  stan¬ 
dard  Galerkin  technique  to  define  the  wavelet-Galerkin  method.  We  find  it 
possible,  with  appropriate  implicit  time  differencing,  to  develop  numerical 
methods  for  the  inviscid  Euler  equations.  We  also  develop  solutions  that  use 
implicit  time  differencing  and  the  standard  hyperviscosity. 

For  the  inviscid  calculations  the  numerical  solutions  develop  a  locally  os¬ 
cillatory  structure.  However,  a  simple  three  term  smoothing  removes  the 
oscillations  and  produces  a  smooth  approximation.  The  hyperviscosity  regu¬ 
larized  solutions  are  already  smooth.  We  compare  the  results  of  these  calcula¬ 
tions  by  integrating  over  very  long  times.  The  similarities  and  differences  sug¬ 
gest  several  interesting  consequences  for  two-dimensional  turbulence  [54, 35]. 
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Basically,  the  inviscid  calculations  over  a  long  time  preserve  certain  hy¬ 
perbolic  structures  that  are  not  preserved  by  the  regularized  calculations, 
and  seem  to  better  capture  the  limiting  inviscid  behavior. 

We  have  presented  a  series  of  numerical  experiments  that  illustrate  sev¬ 
eral  phenomena  of  possible  relevance  to  two-dimensional  turbulence  and  its' 
numerical  resolution  [54]. 

We  use  compactly  supported  wavelets  as  a  Galerkin  basis  and  develop 
a  wavelet- Capacitance  Matrix  method  to  handle  boundary  geometry.  We 
have  developed  an  extension  of  the  standard  Capacitance  matrix  method 
that  greatly  reduces  the  numerical  residual  errors  [33,  34].  In  contrast  with 
the  standard  method,  our  method  shows  fast,  even  spectral,  convergence 
at  relatively  coarse  levels  of  discretization.  Furthermore,  for  comparable 
levels  of  discretization  the  rates  of  convergence  appear  to  be  independent  of 
the  geometry.  For  several  geometries  we  have  made  a  detailed  comparison 
of  methods,  examining  accuracy  and  rates  of  convergence.  We  have  also 
developed  Least-Square  versions  of  our  algorithm  for  the  Helmholtz  equation 
in  nonseparable  geometries  and  examined  the  accuracy  and  convergence  of 
these  methods. 

In  summary,  our  numerical  study  of  the  Helmholtz  equation  shows  that: 

•  the  Wavelet-Galerkin/Capacitance  Matrix  method  (The  Wavelet-Capa¬ 
citance  Matrix  Method)  is  stable  and  spectrally  accurate.  These  results 
apply  to  general  nonseparable  domains  and  all  ranges  of  the  parame¬ 
ters. 

•  The  Wavelet  algorithm  is  found  to  obtain  accurate  results  for  prob¬ 
lems  where,  for  instance,  finite  difference  methods  do  not  converge,  or 
converge  slowly,  and  where  Fourier  Spectral  methods  do  not  apply. 

•  For  a  fixed  level  of  discretization,  increasing  the  order  of  the  wavelet 
basis  spectrally  decreases  the  error. 

•  The  rates  of  convergence  in  sup  norm  appear  to  depend  on  the  wavelet 
basis,  DN,  and  discretization,  6x,  as 

•  The  rates  of  convergence  in  sup  norm  appear  to  be  independent  of  the 
domain  shape. 
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•  Least- Square  versions  of  the  wavelet  algorithm  can  preserve  accuracy 
and  decrease  the  computation  by  more  than  a  factor  of  four.  The  finite 
difference  algorithms  would  not  allow  effective  least-squares  implemen¬ 
tations. 

Furthermore, 

•  All  errors  (accuracy  and  convergence)  are  measured  in  the  pointwise 
sup  norm. 

•  Our  implementation  is  fast,  since  it  is  based  on  fast  (FFT)  evaluations 
for  periodic  geometry  adapted  to  nonseparable  geometry. 

•  The  basic  algorithm  applies  to  one,  two  and  three  space  dimensions, 
without  essential  modification. 

•  For  the  Euler  and  Navier-Stokes  equations  fast  wavelet  algorithms  have 
been  developed  for  flow  in  nonseparable  domains. 

5  Channel  Coding 

Tzannes  and  others  at  Aware  [42,  43]  have  developed  innovative  applica¬ 
tions  of  wavelets  to  error  correcting  codes  in  communications  channels.  In 
particular,  the  wavelet  channel  coding  algorithm  presented  in  [42]  provides 
error  correction  in  additive  white  Gaussian  noise  burst  noise  and  flat  fading 
channels.  We  have  carried  out  simulations  which  verify  these  performance 
gains. 
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6  Appendices 

6.1  About  Aware,  Inc. 

Aware,  Inc.  is  a  mathematical  engineering  company  that  designs  and  sells 
software  and  chipset  products  for  bandwidth  management.  Aware’s  prod¬ 
ucts  organize  and  compress  signals  such  as  images  and  video,  audio  and 
speech,  for  storage  and  communications  applications.  Based  on  the  power¬ 
ful  new  wavelet  mathematics,  which  provides  superior  signal  representation 
and  hierarchical  data  structures.  Aware  accomplishes  this  by  compressing 
the  amount  of  computation  and  storage  needed  to  represent  the  signal,  and 
managing  the  compressed  data  stream  to  create  breakthrough  products  for 
satellite  communications,  teleconferencing,  video  editing  and  distribution, 
computer  multimedia  systems,  mobile  radios,  cellular  telephones,  and  secure 
communications. 

The  Changing  Communications  Environment 

A  revolution  in  information  processing  has  been  made  possible  by  new  t- 
elecommunications  and  computing  technology.  Digital  formats  enable  com¬ 
puter  generated  pictures  and  sound,  and  conventionally  generated  photograph- 
s,  video,  audio  and  speech  communications  to  be  modified  by  computer  edit¬ 
ing  and  to  interchangeably  share  the  same  storage  media  and  communications 
channels.  The  boundaries  between  markets  that  were  once  distinct  and  in¬ 
dependent  are  blurring,  and  product  lines  are  overlapping.  Companies  are 
evolving  new  types  of  organization  and  skills  to  respond  to  the  new  commu¬ 
nication  synthesis. 

Bandwidth  Management 

The  common  denominator  of  digital  computing  and  digital  telecommunica¬ 
tions  is  bandwidth.  Bandwidth  is  the  measure  of  how  much  information  can 
be  transmitted  or  stored  by  an  information  system.  It  is  the  fundamental 
scarce  resource  whose  value  is  reflected  in  technology  trends  and  produc- 
t  pricing.  The  need  for  more  communications,  more  information  storage, 
and  more  computing  resources  in  business  and  consumer  products  is  press¬ 
ing  against  bandwidth  scarcity.  Bandwidth  management  is  the  key  tool  for 
enabling  new  products,  and  for  achieving  higher  productivity  and  profits 
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from  existing  systems. 

Bandwidth  can  be  increased  by  using  more  expensive  technology:  su¬ 
percomputers,  large  disk  drives,  higher  frequency  communications  carrier 
signals.  Each  of  these  methods  increases  the  amount  of  usable  bandwidth. 
Or,  the  available  bandwidth  can  be  used  more  efficiently  by  compressing  the 
source  information. 

Compression  reduces  the  amount  of  bztndwidth  required  to  do  the  task. 
Compression  doesn’t  depend  on  the  technology  of  computers  or  storage  media 
or  communications  equipment:  it  just  reorganizes  the  information  that  the 
customer  uses  in  a  much  more  efficient  way. 

6.2  Government  contracts  related  to  this  contract 

During  the  period  of  this  contract,  other  DoD  contracts  that  depend  on  it 
were  awarded  to  Aware.  They  include: 

•  Spread  Spectrum:  Subcontract  to  Atlantic  Aerospace  Electronics  Cor- 
p.,  based  on  Aware’s  wavelet  channel  coding  technology  (Contract  No. 
AAEC-1214.046-91-001). 

•  ONR  Wavelet  Transform  Chip:  Design  and  fabricate  a  wavelet  trans¬ 
form  processor  chip  to  demonstrate  VLSI  wavelet  technology  (Contract 
No.  N00014-90-C-0167). 

•  ONR  Parttial  Differential  Equations  Research:  (Contract  No.  N00014- 
91-C-0086). 

6.3  Aware  reports  and  presentations  supported  in 
part  by  this  contract 

6.3.1  Reports  and  publications  supported  in  part  by  this  contract 

The  following  is  a  list  of  technical  reports  whose  preparation  was  supported 
in  part  by  this  contract. 

•  R.  Glowinski,  **Note  on  a  Multiplier-Fictitious  Domain  Method  for  the 
Numerical  Solution  of  the  Diridilet  Problem,”  Aware  Technical  Report 
(1990) 
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•  R.  Glowinski,  W.  Lawton,  M.  Ravachol  and  E.  Tenenbaum,  “Wavelet 
Solution  of  Linear  and  Nonlinear  Elliptic,  Parabolic  and  Hyperpolic 
Problems  in  One  Space  Dimension.”  in  Proceedings  of  the  9th.  Inter¬ 
national  Conference  on  Numerical  Methods  in  Applied  Sciences  and 
Engineering  SIAM,  Philadelphia  (1990) 

•  Heller,  P.  N.  “Higher  multiplier  Daubechies  wavelets,”  Aware  Technical 
Report  AD910614  (1991). 

•  Heller,  P.  N.  “A  construction  of  higher  multiplier  wavelet  matrices”, 
Aware  Technical  Report  AD910728  (1991). 

•  Heller,  P.  N.  “Polynomials  are  generalized  eigenfunctions  of  the  wavelet 
transform,”  Aware  Technical  Report  AD910912  (1991). 

•  Heller,  P.  N.  “Polynomials  are  generalized  eigenfunctions  of  the  wavelet 
transform,”  Aware  Technical  Report  AD910912  (1991). 

•  Heller,  P.  N.,  Resnikoff,  H.  L.,  and  Wells,  R.  0.  Jr.,  “Wavelet  matrices 
and  the  representation  of  discrete  functions”,  to  appear  in  Wavelets:  a 
tutorial^  C.K.  Chui,  ed..  Academic  Press,  1991. 

•  Heller,  P.  and  Resnikoff,  H.  L.,  “Polynomials  are  generahzed  Eigenfunc¬ 
tions  of  the  wavelet  transform,”  Aware  Technical  Report  AD910912. 

•  Heller,  P.,  “Higher  rank  Daubechies  wavelets  -  Preliminary  report,” 
Aware  Technical  Report  AD911204. 

•  Huffman,  J.  C.,  Zettler,  W.  and  Linden,  D.  C.  P.,  “Applications  of 
compactly  supported  wavelets  to  image  compression,”  Proc.  SPIE, 
vol.  1244,  pp.  150-160,  (1990). 

•  Jagler,  K.  B.  and  Morrell,  W.,  “The  application  of  multiresolution 
wavelet  techniques  to  interference  cancellation  problems”,  Report  to 
MRJ  (1991). 

•  Lawton,  W,,  “Tight  frames  of  compactly  supported  afBne  wavelets”. 
Aware  Technical  Report  AD891012. 
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•  Latto,  A.  and  Tenenbaum,  £.,  ‘‘Compactly  supported  wavelets  and  the 
numerical  solution  of  Burgers'  equation,"  C.  R.  Acad.  Sci.  France, 
Serie  I,  pp.  903-909  (1990). 

•  Lawton,  W.,  “Necessary  and  sufficient  conditions  for  constructing  or¬ 
thonormal  wavelet  bases,"  J.  Mathematical  Physics,  vol.  32,  pp.  57-61 
(1991). 

•  Lawton,  W.,  “Tight  frames  of  compactly  supported  afiOne  wavelets". 
Aware  Technical  Report  AD891012,  (1989). 

•  Lawton,  W.  and  Resnikoff,  H.  L.,  “Multidimensional  wavelet  bases," 
Aware,  Inc.,  Technical  Report  No.  AD910130  (1991). 

•  Pollen,  D.,  “5t/i(2,  F[2:,  j])  for  F  a  subfield  of  C,"  J.  of  the  Amer. 
Math.  Soc.,  vol.  3,  pp.  611-624,  (1990). 

•  Pollen,  D.,  “Parametrization  of  compactly  supported  wavelets".  Aware 
Technical  Report  AD890503.4.1  (1989). 

•  Pollen,  D.,  “Linear  one-dimensional  scaling  functions”,  Aware  Techni¬ 
cal  Report  AD900104  (1990). 

•  Pollen,  D.  and  Linden,  D.,  “Quadratic  one-dimensional 

•  Resnikoff,  H.  L.,  “Wavelets  and  adaptive  signal  processing,"  to  appear 
in  SPIE  International  Symposium  Proceedings,  Vol.  1565,  Adaptive 
signal  processing,  October,  1991. 

•  Resnikoff,  H.  L.,  “Wavelets  and  adaptive  signal  processing,”  Optical 
Eng.  31(6),  June,  1992. 

•  Resnikoff,  H.  L.  and  Wells,  R.  O.,  “Wavelet  analysis  and  the  geometry 
of  Euclidean  domains" ,  J.  of  Geometry  and  Physics,  vol.  8,  pp.  273-282 
(1992). 

•  Weiss,  J.,  “Wavelets  and  the  dynamics  of  enstrophy  transfer  in  two 
dimensional  hydrodynamics,"  Aware  Technical  Report,  in  progress, 
(June,  1990). 
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•  Weiss,  J.,  “The  long  term  limit  of  two-dimension2J  Euler  flow,”  Aware 
Technical  Report  AD911029. 

•  J.  Weiss,  “Wavelets  and  the  Study  of  Two  Dimensional  Turbulence,”  in 
the  Proceedings  of  the  French-USA  Workshop  on  Wavelets  and  Turbu¬ 
lence,  Princeton  University,  June  1991.  Y.  Maday,  Ed.  Springer- Verlag, 
NY. 

6.3.2  Related  reports 

The  following  report  was  largely  prepared  by  Aware  under  subcontract  to 
Martin-Marietta.  It  reports  work  based  on  research  supported  by  the  con¬ 
tract  for  which  present  document  is  the  Final  Report. 

•  Stirman,  C.,  “Applications  of  wavelets  to  radar  data  processing,”  Final 
Technical  Re;  under  DARPA  Order  No.  7450  monitored  by  AFOSR 
under  contract  No.  F49620-90-C-0050.  Martin-Marietta  Electronics, 
Information,  and  Missiles  Group,  July  1991. 

6.4  Wavelet  technology  in  the  popular  press 
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Preface 

This  monograph  is  intended  to  provide  the  reader  interested  in  mathemati¬ 
cal  engineering  with  an  informal  tour  of  the  theory  of  compactly  supported 
wavelets,  its  applications,  and  what  it  tells  us  about  the  relationship  be¬ 
tween  the  discrete  and  the  continuous  in  mathematical  models  of  physical 
systems. 

Its  content  is  largely  based  on  the  work  of  my  present  and  past  colleagues 
at  Aware,  who  are  in  large  measure  responsible  for  the  many  results  it 
contains.  I  of  course  assume  all  responsibility  for  errors  (of  omission  as  well 
as  commission)  but  in  this  instance  I  could  not  have  made  some  of  those 
errors  without  their  help!  The  book  also  is  indebted  to  other  explorers  of 
the  vast  new  territory  that  the  concept  of  compactly  supported  wavelets  has 
revealed.  I  have  acknowledged  their  contributions  where  they  have  played 
a  part  in  the  story  I  have  to  tell. 

I  am  pleased  to  express  my  appreciation  to  my  former  colleagues  Wayne 
Lawton,  David  Pollen,  David  C.  Plummer  (Linden)  and  Eric  Tenenbaum, 
whose  early  and  fundamental  contributions  to  the  development  of  wavelet 
theory  at  Aware  have  profoundly  influenced  the  way  I  think  about  the 
subject,  and  to  Ramesh  Gopinath  and  Professors  C.  S.  Burrus  and  Roland 
Glowinsky  for  their  insights  and  stimulation  during  their  extended  visits  to 
the  company.  Discussions  with  Professor  Louis  Auslander  during  a  period 
of  more  than  three  years  stimulated  the  author's  mathematical  imagination 
and  introduced  Aware  to  the  potential  of  wavelets  for  channel  coding. 

Special  thanks  are  due  to  Jonathan  Devine,  Peter  Heller,  John  Huffman, 
Karl  Jagler,  William  Morrell,  Richard  Tolimieri,  Michael  Tzannes,  John 
Weiss,  and  Bill  Zettler,  and  to  my  friend  and  colleague  Ronny  Wells,  all  of 
whom  will  recognize  their  hands  (and  heads)  throughout  this  book,  and  to 
the  rest  of  the  staff  of  Aware  whose  contributions  to  this  book  have  been 
considerable  even  though  they  may  be  less  visible. 

I  am  particularly  grateful  to  Christina  Gorecki  who  kept  the  unruly 
team  of  Tex,©  Matheinatica,©  a-  i.h  Macintosh  operating  system  hitched 
together  and  more  or  less  pulling  ii.  a  common  direction  as  a  sideline  while 
she  performed  her  real  duties. 

Aware,  Inc.  is  indebted  to  the  Defense  Advanced  Research  Projects 
Agency  for  its  far-sighted  support  which  enabled  Aware  to  explore  fun¬ 
damental  issues  at  the  interface  of  mathematics  and  computation  in  the 
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context  of  developing  practical  solutions  to  difficult  and  important  prob¬ 
lems.  We  believe  that  this  partnership  of  common  interests  between  small 
commercial  companies  and  government  agencies,  between  mathematicians 
and  engineers,  and  between  theory  and  practice  will  be  the  paradigm  for 
developing  advanced  technology  in  the  twenty  first  century. 


Collected  below  are  certain  symbols  used  in  the  text. 


Z 

Q 

D 

R 

C 

L\R) 

I2([0,l]) 

Ig* 


The  integers  Z  2,  —1,0, 1,2, . . . ,  n . . .}. 

The  rational  numbers  Q  :=  {m/n  where  m,  n  €  Z  and  n  ^  0}. 

The  dyadic  raiionals  D  ;=  {q/2’'  :  9  €  Z  an  odd  integerand  n  €  Z}. 
The  real  numbers. 

The  complex  numbers. 

The  space  of  finite  energy  signals  on  R. 

The  space  of  finite  energy  signals  on  the  interval  [0, 1]. 

The  base  2  logarithm  of  x. 


The  imaginary  unit  is  i  :=  y/l. 
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Figure  1:  Aware’s  logo.  The  square  is  a  map  of  the  two  dimensional  pinched 
torus  parameter  space  of  six  coefficient  wavelet  matrices.  The  curves  are  the 
loci  of  wavelet  matrices  that  correspond  to  wavelet  bases  that  are  formally 
differentiable  of  order  1. 
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Figure  2:  One  dimensional  row-transformed  Lenna  image  employing  the 
D3  wavelet  basis.  Lenna  was  the  centerfold  in  the  November  1972  issue  of 
Playboy.  This  image,  digitized  from  the  original,  has  become  a  standard  in 
the  image  processing  community,  which  usually,  but  incorrectly,  identifies  it 
as  “Lena.”  Today  Lenna  is  also  known  as  “NITF-6”  in  the  National  Image 
Transmission  Format  test  image  suite. 
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Chapter  1 

The  New  Mathematical 
Engineering 


1.1  Trial  and  Error  in  the  21st  Century 

A  fundamental  break  with  the  past  occurred  during  the  last  forty  years. 
Each  of  us  is  now  dependent  on  technology  for  sustaining  not  only  our  style 
of  Ufe  but  possibly  our  life  itself.  Today  nations  and  corporations  under¬ 
take  projects  whose  scale  and  complexity  were  undreamt  merely  a  genera¬ 
tion  ago.  People  depend  on  products  their  grandpuents  would  have  called 
miracles,  and  they  expect  the  marketplace  to  provide  levels  of  performance, 
safety,  and  affordability  that  require  advances  all  along  the  convoluted  fron¬ 
tier  of  science  and  technology. 

Before  this  break  with  the  past,  most  large  undertakings  differed  in 
degree,  but  not  in  kind,  from  what  had  previously  been  tried  and  mastered. 
New  products  were  tested  in  the  Uboratorg  and  industrial  processes  were 
scaled  up  from  laboratory  trials. 

Improvement  by  trial  and  error  served  past  generations  well,  but  it 
cannot  serve  the  future.  Today,  design  and  manufacturing  antecedents  that 
provide  insight  into  the  long  term  consequences  of  large  scale  projects  are 
usually  lacking.  The  trans- Alaskan  pipeline,  landing  explorers  on  the  iiK>on 
and  bringing  them  home,  assessing  the  crashworthiness  of  new  automobiles, 
and  finding  the  optimal  shape  for  the  wing  of  a  jet  transport  are  problems 
that  cannot  be  solved  by  trial  and  error  or  by  scaling  up  the  results  of 
affordable  small  scale  tests. 

Today,  testing  the  real  thing  is  often  too  costly,  too  time-consuming,  or 
just  too  complicated  to  be  practical.  More  and  more  (rften,  when  a  project 
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involves  human  health,  or  the  interdependence  of  human  well-being  and  the 
well-being  of  the  environment,  testing  the  real  thing  may  not  be  possible. 

Development  by  trial  and  error  was  suited  to  the  time  when  technology 
was  in  its  infancy.  That  approach  is  no  longer  adequate.  The  need  to  pre¬ 
dict  performance  and  consequences,  and  to  optimize  design  for  safety,  qual¬ 
ity,  and  cost  have  become  key  ccunpetitive  and  social  factors  for  industrial 
economies.  These  factors  call  for  new  tools  suited  to  these  new  problems. 
One  important  new  tool  for  predicting  performance  and  optimizing  design 
is  mathematical  engineering. 


1.2  Mathematical  Engineering 

1.2.1  Active  Mathematics 

Traditionally,  mathematics  only  played  a  role  in  the  earliest  stages  of  en¬ 
gineering  design.  Mathematics  was  used,  for  example,  to  prescribe  the 
arrangement  and  strength  of  the  parts  of  a  structural  bridge  design,  or  the 
shape  and  size  of  a  wing  that  would  meet  the  design  objectives  for  lift,  or  to 
test  the  electrical  characteristics  of  a  layout  for  a  VLSI  chip  design.  Once 
the  design  was  complete,  the  role  of  mathematics  was  played  out. 

Today,  the  digital  computer  and  the  VLSI  chip  have  created  a  new 
role  for  mathematics  in  technology.  Mathematical  algorithms  can  now  be 
directly  embodied  in  products,  so  that  every  time  the  product  is  used,  the 
algorithm  is  working.  This  active  mathematics  enables  totally  new  classes 
of  products,  from  the  cellular  telephone,  where  an  algorithm  in  the  form  of 
channel  coding  protects  the  message  from  the  distortions  of  noisy  channels, 
to  compact  audio  discs,  where  error  correction  coding  insures  the  fidelity 
of  music  despite  dust  and  imperfections  of  the  recording  medium.  This 
new  kind  of  use  of  mathematics  will  have  an  increasingly  profound  impact 
on  adaptive  process  controls  in  manufacturing  and  on  real-time  dynamic 
controls  in  products  -  from  automobiles  and  airliners  to  electronic  pocket 
organizers  -  that  are  used  by  everyone. 

Active  mathematics  calls  for  a  new  approach  to  the  use  of  mathematics 
in  engineering  -  a  new  mathematical  engineering  ~  that  employs  a  sys¬ 
tems  approach  to  problems  where  mathematical  process  models,  efficient 
algorithms  for  computer  numerical  solutions,  and  area-efficient  VLSI  inq>le- 
mentations  for  portable  and  computation-intensive  real-tinne  applications 
are  efficiently  combined  to  yield  an  integrated  solutbn. 
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1.2.2  The  Three  Types  of  Bandwidth 

Communications  Bandwidth 

Bandwidth  is  usually  thought  of  in  the  context  of  broadcast  coirnnunica- 
tions,  where  it  is  the  physical  means  by  which  a  radio  (»  television  signal 
carries  information.  In  the  United  States,  the  available  spectrum  is  allo¬ 
cated  by  the  Federal  Communications  Coiiunission  (FCC)  for  signals  that 
propagate  through  the  atmosphere.  Although  bandwidth  is  an  abstract 
concept  that  many  people  find  difficult  to  understand,  from  an  economic 
point  of  view  bandwidth  acts  like  a  commodity.  Like  any  other  commodity, 
bandwidth  can  be  plentiful  or  scarce  relative  to  the  need  for  it.  With  the 
growth  of  telecommunications  and  computing,  communications  and  elec¬ 
tronic  storage  bandwidth  have  become  increasingly  scarce  commodities. 


Information  Storage  Bandwidth 

Information  that  is  stored  on  a  disk  drive  or  any  other  storage  medium  also 
consumes  bandwidth,  but  in  this  case  the  storage  bandwidth  that  the  device 
uses  is  built  into  the  product  in  the  form  of  the  physical  storage  medium. 
For  a  fixed  technology,  the  price  of  the  product  will  generally  increase  with 
the  storage  bandwidth  it  provides.  The  user  pays  for  the  bandwidth. 


Computational  Processing  Bandwidth 

Every  computer  has  a  computational  bandwidth  which  is  roughly  propor¬ 
tional  to  the  number  of  operations  it  can  perform  per  second.  Computa¬ 
tional  bandwidth  is  really  a  measure  of  the  amount  of  computation  that 
a  computer  can  perform  per  unit  time.  The  difference  between  these  two 
definitions  depends  on  the  efficiency  of  the  algorithms  that  are  employed. 
Processing  bandwidth  is  not  strictly  independent  of  conununications  band¬ 
width  and  storage  bandwidth  because  a  computer  combines  logical  process¬ 
ing  operations,  communication  with  its  internal  memory,  and  intermediate 
storage  of  information  in  registers  as  components  of  the  computaticmal  pro¬ 
cess. 

These  three  types  of  bandwidth  -  communications  bandwidth,  informa¬ 
tion  storage  bandwidth,  and  computational  processing  bandwidth  -  are  three 
aspects  of  one  fundamental  concept. 
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Table  1.1:  Spectrum  allocation  in  the  United  States 


Frequency  Band 
(kHz) 

Designation 

Propagation 

Characteristics 

Typical  Uses 

3  X  10“  -  3  X  lO’ 

VLF 

Submarine  communications 

3  X  10^  -  3  X  10* 

LF 

Long-range  navigation; 
Marine  communications 

3  X  10®  -  3  X  10* 

MF 

Atmospheric  noise. 

AM  broadcasting; 

Emergency  frequencies; 
Maritime  radio. 

3  X  10*  -  3  X  10^ 

HF 

Ionospheric 

reflection 

Telephone;  Facsimile; 

Aircraft  and  ship  comm; 
International  broadcasting; 
Military  communications 

3  X  10^  -  3  X  10» 

VHF 

Near  line-of- 
sight;  scatterbg. 

VHF  television; 

FM  two-way  radio; 

AM  aircraft  comm. 

3  X  10®  -  3  X  10*° 

UHF 

Line-of-sight. 

UHF  television; 

Radar; 

microwave  links. 

3  X  10»  -  3  X  10*° 

SHF 

Line-of-sight; 

Rainfall  attenuation. 

Satellite  communications 
Radar. 

3  X  10*°- 3  X  10** 

EHF 

Water  vapor 
absorption. 

Radar; 

Satellite  communicaticms 

10*2  _  ioi« 

Infrared, 
Visible  Light, 
Ultraviolet 

Optical  communications. 
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1.2.3  Communications  Bandwidth:  Mathematical  En¬ 
gineering  for  Channel  Coding 

Digital  signal  processing  for  communication  is  cnie  of  the  areas  where  wavelet 
technology  is  making  a  difference. 

Communication  always  involves  a  sender,  a  receiver,  and  a  “ccHiununi- 
cations  channel,”  which  is  a  means  for  transmitting  the  signal.  A  conver¬ 
sation,  a  radio  broadcast,  a  cable  TV  program,  a  person  reading  a  book, 
a  person  typing  on  a  computer  keyboard,  and  a  computer  transferring  in¬ 
formation  to  a  hard  disk  are  all  examples  of  communication.  In  each  case, 
information  is  transmitted  by  modifying  something  physical  that  can  inter¬ 
act  with  both  the  sender  and  the  receiver.  For  instance,  speaking  causes 
small  systematic  pressure  variations  in  the  air  between  the  speaker  and  the 
hearer,  which  the  hearer’s  ear  translates  or  “decodes”  into  a  stream  of  ner¬ 
vous  impulses  that  are  transmitted  to  the  brain.  The  pressure  variation  of 
air  is  the  conununications  channel  for  transmitting  the  speech  signal.  The 
air  between  the  speaker’s  mouth  and  the  hearer’s  ear  can  be  thought  of  as 
the  “carrier”  of  the  speech  signal;  the  small  pressure  variation  caused  by 
speaking  “modulates”  the  carrier  to  encode  the  information  signal  on  it. 

Channel 

For  broadcast  radio  and  TV,  the  channel  is  the  electrical  variation  of  an 
electromagnetic  wave  broadcast  from  the  truismitter  tower.  The  electro¬ 
magnetic  wave  is  the  carrier.  The  small  electrical  variations  that  represent 
the  information  signal  modulate  the  carrier  by  changing  its  physical  prop¬ 
erties.  Amplitude  modulation,  used  for  AM  radio,  modifies  the  energy  of 
the  electromagnetic  carrier  wave,  while  frequency  nx>dulation,  used  for  FM 
broadcasts  and  TV,  modifies  its  frequency. 

Noise 

While  a  signal  is  passing  from  the  source  to  the  receiver,  it  is  at  the  mercy  of 
the  surrounding  physical  environment.  A  lightning  flash  will  drown  out  an 
AM  radio  signal  just  as  a  drum  roll  will  drown  out  speech.  The  information 
signal  is  still  there,  but  it  is  masked  by  the  larger  signal  and  cannot  be 
detected  by  the  receiver,  just  as  the  ear  cannot  detect  a  softer  tone  masked 
by  a  nearby  louder  one.  “Noise”  is  simply  an  unwanted  modulation  of  the 
carrier  whose  presence  interferes  with  detection  of  the  desired  signal.  One 
person’s  noise  is  another  person’s  signal.  The  boom  boxer  thinks  the  nearby 
conversation  is  noise.  Other  people’s  conversations  audible  on  a  telephone 
line  are  usually  placed  into  the  noise  category.  The  telephone  company 
calls  this  “co-channel  interference.” 
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Noise  -  interference  -  is  always  present  in  every  communications  chan¬ 
nel,  and  it  corrupts  every  communication.  It  causes  the  receiver  to  make 
errors  in  decoding  the  transmitted  signal.  These  channel  errors  limit  the 
efficiency  of  the  coimnunications  system. 


Channel  Capacity 

Every  communications  channel  can  transmit  a  certain  amount  of  infor¬ 
mation  but  no  more.  The  number  of  bits  per  second  that  a  channel  can 
transmit  is  called  its  channel  capacity.  Channel  edacity  decreases  as  the 
amount  of  noise  increases,  and  it  increases  as  the  amount  of  energy  used  to 
modulate  the  carrier  increases.  For  radio  channels,  the  channel  capacity  is 
proportional  to  the  channel’s  frequency  bandwidth.  Bandwidth  is  another 
name  for  channel  capacity,  and  channel  coding  is  an  important  part  of  a 
strategy  for  managing  bandwidth  efficiently. 

The  notion  of  channel  capacity  is  very  general  and  can  be  used  to  de¬ 
scribe  any  communications  system.  Commercial  airline  traffic  supplies  an 
easily  understood  example.  The  travelers  on  the  airplane  represent  the  in¬ 
formation  bits  and  the  aircraft  is  the  carrier.  The  route  through  the  sky 
is  the  channel.  In  regulated  international  air  travel,  airline  routes  are  a 
valuable  commodity.  Increasing  the  number  of  flights  between  two  airports 
is  one  way  of  increasing  channel  throughput  to  increase  revenues.  The  ca¬ 
pacity  of  the  channel  is  determined  by  how  many  aircraft  can  occupy  the 
route  at  the  same  time. 


Channel  Coding 

We  will  pursue  this  example  further.  The  number  of  airplanes  that  could  oc¬ 
cupy  the  routes  between,  for  example,  Boston  and  London,  is  much  greater 
than  the  number  that  are  permitted  to  fly,  because  government  regulations 
require  aircraft  to  maintain  a  minimum  separation  in  both  space  and  time. 
This  minimum  distance  guarantees  reliable  air  travel.  It  is  a  form  of  channel 
coding. 

A  communications  link  needs  to  be  reliable.  The  importance  of  the 
information  being  transmitted  determines  how.  Reliability  is  usually  mear 
sured  by  the  average  bit  error  rate,  i.e.,  the  number  of  bit  errors  that,  on 
the  average,  are  tolerable  for  the  ^plication.  For  radio  communications, 
links  employing  digital  data,  an  error  rate  of  one  in  every  ten  thousand  bits 
may  be  acceptable.  In  the  case  of  airplane  travel,  no  error  -  no  loss  of  life 
-  is  acceptable,  so  the  distance  between  airplanes  in  flight  is  kept  relatively 
large. 
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Cars  traveling  on  a  road  maintain  as  great  an  inter-car  distance  as  air¬ 
planes  in  flight.  Less  distance  between  them  is  required  because  the  road 
channel  is  ‘^ess  noisy”  (events  happen  more  slowly),  and  the  consequences 
of  an  occasional  error  are  usually  not  catastrophic.  The  distance  between 
vehicles,  although  it  wastes  part  of  the  channel  capacity,  is  needed  to  main¬ 
tain  “reliable  coimnunications.” 

Channel  coding  is  a  sophisticated  means  of  placing  distance  between 
information  signals  by  putting  redundancy  in  the  information.  This  is  the 
only  way  to  avoid  or  correct  errors  that  arise  from  channel  noise. 

1.2.4  Information  Storage  Bandwidth:  Mathematical 
Engineering  for  Compression 

There  are  two  kinds  of  compression  -  “lossless”  and  “lossy” . 

Lossless  Compression 

Lossless  r  impression  is  sometimes  called  “arithmetic  coding”  or  “entropy 
coding”  b  .ause  it  never  destroys  information.  Lossless  compression  merely 
removes  the  redundancy  in  a  signal.  Since  it  never  destroys  information,  the 
original  signal  can  be  exactly  reconstructed  from  the  losslessly  compressed 
version.  But  most  signals  do  not  have  very  much  redundancy.  This  limits 
the  effectiveness  of  lossless  compression  which,  for  gray  scale  images,  tends 
to  produce  compression  ratios  of  only  2-  or  3-  to  -1  [44]. 

Lossless  compression  is  like  compressing  a  gas  in  a  cylinder  by  means 
of  a  piston:  all  the  gas  molecules  are  still  there,  and  the  gas  returns  to  its 
ori^nal  condition  when  the  pressure  is  released  by  withdrawing  the  piston. 
The  compressed  gas  requires  a  smaller  volume  to  store  the  same  number 
of  gas  molecules.  The  cost  of  compression  and  subsequent  decompression 
when  the  gas  is  used  is  repaid  by  the  savings  that  are  byproducts  of  the 
reduced  volume  for  shipment  and  storage. 

Lossless  compression  of  information  works  the  same  way.  Information 
expressed  in  digital  form  as  a  sequence  of  bits  can  be  reorganised  so  that 
the  same  information  can  be  expressed  in  terms  of  fewer  bits  by  removing 
the  redundancy.  The  compressed  format  is  usually  not  directly  usable; 
when  the  information  is  used,  it  will  be  wanted  in  its  uncompressed  form. 
The  cost  of  lossless  compression  and  decompression  must  be  repaid  by  the 
savings  that  are  byproducts  of  the  smaller  amount  of  storage  or  transmission 
time  required  by  the  compressed  information  if  lossless  compression  is  to  be 
commercially  worthwhile.  Lossless  compression  is  used  when  it  is  important 
to  be  able  to  reconstruct  the  signal  exactly;  no  errors  are  permitted.  The 
use  of  lossless  compression  for  financial  data  records  is  a  typical  application. 
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The  critical  difference  between  compressing  a  gas  and  compressing  in¬ 
formation  is  that  there  is  a  limit  to  how  far  information  can  be  losslessly 
compressed.  The  amount  of  information  in  a  signal  is  the  number  of  bits 
that  are  needed  to  express  the  information  in  its  most  compressed  form. 
If  the  signal  is  compressed  further  (e.g.,  by  throwing  away  some  of  the  re¬ 
maining  bits),  then  some  information  present  in  the  original  signal  will  be 
lost. 


Lossy  Compression 

Lossy  compression  selectively  discards  less  important  information  from  the 
signal.  This  permits  much  higher  compression  ratios  and  often  -  it  depends 
on  the  application  -  there  will  be  little  or  no  perceptible  difference  between 
the  original  signal  and  the  signal  constructed  from  the  compressed  version. 
Lossy  compression  for  full  color  still  images  can  be  30  times  as  great  as 
lossless  compression  without  producing  significant  distortion.  This  is  pos¬ 
sible  because  images  typically  contain  “noise”  and  other  details  that  are 
subliminal  and  therefore  not  perceived  by  the  ^iewer.  Speech,  sound,  and 
imagery  can  all  be  compressed  by  lossy  compression  to  a  far  greater  degree 
than  by  lossless  compression. 

Lossy  compression  discards  “less  important”  information.  The  critical 
feature  of  a  lossy  compression  method  is  the  way  it  decides  which  informa¬ 
tion  is  important.  The  decision  generally  depends  on  the  application,  but 
for  sensory  data  like  speech,  sound,  and  imagery,  all  of  which  are  processed 
by  people,  there  are  similarities  that  make  it  possible  to  design  compression 
algorithms  that  have  a  common  general  structure. 

1.2.5  Computational  Bandwidth:  Mathematical  Engi¬ 
neering  for  Manufacturing 

Some  of  the  most  demanding  applications  for  mathematical  engineering  are 
problems  of  manufacturing.  Shortening  product  design  cycles  and  improv¬ 
ing  the  performance  of  manufactured  products  are  necessary  for  economic 
competitiveness.  They  directly  affect  manufacturing  costs  and  the  ability  to 
be  responsive  to  changes  in  consumer  taste,  the  two  short-term  factors  that 
determine  profitability.  The  health  and  safety  effects  of  products  will  be¬ 
come  increasingly  important  as  products  become  ever  more  complex.  Other 
things  being  roughly  equal,  economies  and  companies  that  are  able  to  lead 
in  these  aspects  will  be  more  successful  than  those  that  cannot. 

It  is  not  easy  to  shorten  design  time  and  improve  product  performance 
because  today’s  manufactured  products  are  much  more  complex  than  those 
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of  just  a  few  decades  ago.  FYom  an  inexpensive  television  set  to  a  multimil- 
lion  dollar  gas  turbine  jet  engine,  most  manufactured  products  are  now  too 
complicated  for  old  fashioned  design  procedures.  The  old  procedures  cost 
too  much;  they  take  too  long;  and  they  are  incompatible  with  interactive 
performance  simulation  that  is  necessary  to  optimize  designs  for  reduced 
manufacturing  cost  and  increased  quality. 

The  traditional  role  of  mathematics  in  engineering  design  and  its  new 
active  role  in  product  operation  can  be  combined  by  interactively  using 
computer-implemented  mathematical  simulations  to  analyze  and  improve 
product  performance.  Although  the  computer  is  already  a  principal  tool  in 
many  industries,  and  has  been  used  to  make  great  strides  in  organizing  and 
controlling  manufacturing  processes,  its  interactive  use  to  optimize  product 
design  and  reduce  design  cycle  time  is  stUl  untapped. 

At  the  heart  of  the  product  design  process  is  the  ability  to  represent 
the  operation  of  a  product  by  a  mathematical  process  model.  The  pro¬ 
cess  model  underpins  the  ability  to  simulate  performance  by  ‘Running”  the 
mathematical  model  on  a  computer. 

If  a  mathematical  simulation  is  to  be  useful  for  design,  it  must  be  accu¬ 
rate  and  fast;  if  it  is  to  be  practical,  it  must  run  on  inexpensive  computers. 
Typically,  the  most  difficult  and  costly  part  of  mathematical  simulation  for 
manufacturing  involves  nonlinear  processes,  like  combustion,  fluid  flow,  and 
forming  metals  and  plastics.  In  this  context,  the  mathematical  description 
usually  consists  of  a  collection  of  partial  differential  equations  that  model 
the  physical  properties  of  the  product.  The  numerical  solution  of  these 
equations  describes  the  changes  that  occur  as  the  product  performs  its 
function. 

Although  the  speed  of  computers  will  continue  to  increase  in  the  coming 
years,  this,  by  itself,  will  not  be  sufficient  to  put  the  power  of  interactive 
simulation  in  the  hands  of  every  product  design  and  manufacturing  engi¬ 
neer  who  could  benefit  from  it.  Unless  the  product  is  very  expensive  or 
the  number  of  units  manufactured  is  very  large,  most  companies  cannot 
afford  the  cost  of  the  powerful  computer  or  supercomputer  that  would  be 
needed  to  provide  accurate  and  timely  simulation  results  if  the  the  current 
generation  of  simulation  techniques  are  used.  This  means  that  in  a  period 
of  increasing  product  customization,  the  vital  "middle  class”  of  the  manu¬ 
facturing  sector  that  consists  of  medium  sized  engineering  companies  will 
not  be  competitive. 

Affordable  solutions  to  the  interactive  design  problem  are  likely  to  be 
based  on  a  combination  of  new  and  more  accurate  mathematical  models  for 
representing  the  product’s  operation,  and  more  accurate  and  more  stable 
numerical  methods  for  solving  the  resulting  differential  equations  quickly 
enough  to  support  interactive  design  solutions.  These  advances  would  en- 
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able  simulations  that  can  reduce  the  time  to  solution,  be  i4;>plied  to  cases 
where  conventional  methods  fail,  and  are  interactive  and  economical. 


1.3  Are  Breakthroughs  on  the  Way? 

Is  there  a  scientific  basis  for  the  breakthroughs  that  will  be  required  to 
support  active  mathematics  in  complex  products  and  for  interactive  de¬ 
sign?  We  believe  there  is.  The  mathematical  theory  of  compactly  sup¬ 
ported  wavelets  was  discovered  just  a  few  years  ago,  but  wavelet  numerical 
solution  of  nonlinear  differential  equations  has  already  demonstrated  its 
potential,  and  wavelet-based  active  mathematics  have  already  shown  their 
ability  to  support  dynamic  bandwidth  management  for  digital  coimnunica- 
tion,  storage,  and  high  quality  compression  of  audio  and  imagery. 

Nonlinear  and  transient  physical  phenomena  that  exhibit  nonlinear  be¬ 
havior  such  as  combustion  processes  in  an  automobile  engine,  or  air  flow 
over  a  turbine  blade  in  a  jet  engine  are  difflcult  to  simulate.  Systems  like 
these  exhibit  shock  behavior,  with  turbulence  and  detailed  structure  at 
many  levels  of  resolution.  Conventional  numerical  procedures  require  very 
fine  meshes  and  very  small  time  steps  to  insure  the  accuracy  of  numerical 
solutions,  although  they  do  not  insure  that  the  numerical  procedures  will  be 
stable,  or  produce  accurate  results.  Wavelet  numerical  solutions,  however, 
are  well  suited  to  these  types  of  problems.  They  are  able  to  capture  the 
behavior  of  complex,  nonlinear  dynamical  systems  with  an  accuracy  and 
speed  not  possible  with  known  alternative  techniques. 

In  this  book  we  introduce  the  reader  to  the  ideas  that  lie  behind  the 
theory  of  compactly  supported  wavelets.  We  relate  them  to  previously 
known  methods  in  mathematics  and  engineering;  show  how  they  can  be 
practically  used  in  a  digital  signal  processing  and  computing  environment, 
and  illustrate  their  potential  for  mathematical  engineering  by  describing 
successful  applications  in  bandwidth  management. 


Chapter  2 

Good  Approximations 


“An  addition  to  knowledge  is  won 
at  the  expense  of  an  addition  to  ignorance.” 

— Arthur  S.  Eddington  * 


2.1  Approximations  and  the  Perception  of 
Reality 

Every  measurement,  whether  the  naked  result  of  u  impression  on  the  hu¬ 
man  eye  or  ear,  or  the  result  of  a  sophisticated  measuring  instrument,  is 
merely  an  approximation.  We  can  know,  and  our  computing  machines  can 
know,  only  a  finite  number  of  decimal  places  in  the  numerical  representa¬ 
tion  of  a  distance  or  a  weight  or  a  force  or  a  temperature.^  The  eye  has 
limited  resolving  power,  the  ear  a  limited  frequency  response,  and  this  is 
true  for  all  instruments  whether  biological  or  “mechanical.”  This  limita¬ 
tion  is  not  merely  due  to  poor  “manufacturing  technique;”  as  Heisenberg’s 
Uncertainty  Principle  tells  us,  it  is  inherent  in  the  essence  of  things.  Life 
and  the  universe  depend  on  approximations.  And  so,  too,  does  technology. 

The  certainty  of  error  in  every  measurement  and  every  physical  inter¬ 
action  emphasizes  the  importance  of  knowing  how  accurate  a  particular 
approximation  happens  to  be.  If  an  approximate  value  is  accurate  enough 
for  the  purpose  in  hand,  it  is  “good.”  Other  things  being  equal,  a  bet¬ 
ter  approximation  is  preferred  to  a  worse  one.  How  good  is  a  particular 
approximation?  For  a  fixed  expenditure  of  computational  resources,  mea¬ 
surements  and  analyses  expressed  in  terms  of  compactlg  supported  wavelets 

•[10] 

^It  foUowf  that  every  meamired  or  esplidtly  computed  mimber  is  a  rafioaa/  mindber. 
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Figure  2-1:  Continuous  ramp  transient. 


give  better  approximations  to  speech  signals,  turbulence  and  other  tran¬ 
sient  or  localized  phenomena  than  conventional  methods.  Here  is  a  simple 
example:  consider  a  time  series  that  describes  a  quantity  that  is  zero  for  a 
long  time;  ramps  up  linearly  to  a  maximum  value;  and  falls  instantaneously 
to  zero  where  it  remains.  Such  a  signal  is  shown  as  the  smooth  curve  in 
figure  2-1. 

Suppose  this  signal  is  sunpled  at  55  uniformly  spaced  times.  The  sam¬ 
ple  measurements  can  be  used  to  construct  a  Fourier  series  expansion  of 
the  signal;  if  all  55  measurements  are  used,  then  the  Fourier  series  will 
have  55  terms  and  it  will  perfectly  reproduce  the  measured  numbers.  It 
will  also  interpolate  values  for  unmeasured  instants.  However,  if  fewer  than 
55  terms  are  used,  then  the  partial  series  will  produce  an  approximation 
of  the  measured  values.  The  figure  shows  the  ^jproximate  values  pro¬ 
vided  by  a  27  term  Fourier  expansion.  Figure  2-3  shows  a  27  term  wavelet 
series  approximation.  These  figures  display  two  important  prcqperties  of 
wavelets:  Wavelet  aeries  approximate  abrupt  transitions  muck  more  accu¬ 
rately  than  Fourier  aeries  and  Wavelet  series  perfectly  reproduce  constant 
measurements.  Wavelets  produce  better  approximations  but  a  wavelet  ap¬ 
proximation  doesn’t  cost  more  to  calculate  than  an  ordinary  approximation. 
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Figure  2-2;  Fifty-five  point  ramp  transient;  27  term  Fourier  series  approx¬ 
imation. 


Figure  2-3;  Fifty-five  point  ramp  transient;  27  term  D3  wavelet  series  low 
pass  approximation. 
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2.1.1  Efficient  Mathematical  Models 

We  can  expect  wavelets  to  provide  better  approximations  when  the  data 
exhibit  localized  variations,  but  when  the  data  vary  regularly  and  smoothly 
other  more  traditional  representations  may  be  better.  For  instance,  if  a 
violin  string  is  plucked,  most  of  the  energy  will  be  concentrated  near  the 
plucked  point  for  a  short  period  of  time  -  here,  the  wavelet  series  will 
provide  an  accurate  and  economical  approximation.  But  as  time  passes, 
the  energy  of  the  pluck  will  spread  along  the  string  and  be  distributed 
among  the  normal  modes  of  oscillation,  which  are  sinusoidal.  After  a  while, 
a  Fourier  series  will  provide  a  more  economical  approximation. 

This  example  shows  that  it  is  important  to  know  how  to  use  wavelet 
approximations  and  traditional  approximations  in  the  same  problem. 

Compact  support 

One  reason  that  wavelets  provide  good  approximations  for  transient  or 
localized  phenomena  is  that  each  basis  function  -  each  term  -  in  a  wavelet 
series  has  compact  support  and,  no  matter  how  short  an  interval  is,  there 
is  a  basis  function  whose  support  is  contained  within  that  interval.  The 
intuitive  meaning  of  this  property  is  that  compactly  supported  wavelet 
basis  functions  can  model  local  behavior  efficiently  because  they  are  not 
constrained  by  properties  of  the  data  far  away  from  the  location  of  interest. 

This  feature  of  compactly  supported  wavelets  also  makes  it  easy  to  con¬ 
centrate  computation  where  the  activity  is  high. 

Compactly  supported  bases  lead  to  inherently  parallelizable  algorithms. 
Aware’s  collaborators  at  Rice  University  recently  demonstrated  that  wavelet 
methods  for  the  numerical  solution  of  a  class  of  nonlinear  partial  differential 
equations  could  be  run  on  a  massively  parallel  computer  with  little  change 
in  the  algorithm  or  program  design.  They  led  to  an  efficient  use  of  the 
parallel  computer,  and  an  accurate  solution  of  the  problem. 

Orthogonality 

The  terms  in  a  wavelet  series  are  orthogonal  to  one  another,  just  like  the 
terms  in  a  Fourier  series.  This  means  that  information  carried  by  one 
term  is  independent  of  information  carried  by  any  other  term:  there  is  no 
redundancy  in  the  representation.^  This  is  good  because  it  means  that 

^More  pr«citely>  there  ia  no  redundancy  with  lecpect  to  the  L^(R.)  norm  in  tenw  of 
which  orthog<»ulity  i*  defined.  There  may  be  other  type*  of  reUtiooahip*  among  the 
coefficient*  of  a  wavelet  expanaion  that  would  permit  a  further  reduction  of  redundancy, 
but  they  would  lie  outaide  the  realm  of  aquare-integraUe  orthogonality.  This  i*  a  general 
queation  that  hat  nothing  in  particular  to  do  with  wavelet*. 
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neither  computing  cycles  nor  storage  are  wasted  as  a  result  of  coefficient 
redundancy  when  a  wavelet  series  is  calculated  or  stored. 


Multiresolution  representation 

Biological  sensory  systems,  such  as  vision,  and  many  physical  systems  are 
organized  into  “levels”  or  “scales”  of  some  variable,  just  like  organizational 
and  economic  structures,  and  the  positional  notation  of  arithmetic.  In 
this  sense,  a  multiresolution  or  scalable  mathematical  representation  may 
provide  a  simpler  and  more  efficient  representation  than  conventional  math¬ 
ematical  representations.  We  delve  more  deeply  into  this  basic  concept  in 
the  next  chapter. 


2.1.2  Computational  Complexity 

Omputational  complexity  is  a  measure  of  the  number  of  elementary  oper¬ 
ations  need  to  solve  a  computational  problem.  Computational  complexity 
if  indirectly  a  measure  of  time  to  solution  for  a  given  computing  system 
as  well  as  a  measure  of  the  cost  of  solution.  Therefore  the  computational 
complexity  of  the  numerical  solution  of  a  problem  is  a  critical  factor  in 
deciding  whether  a  proposed  sdution  method  can  be  practical. 

Problems  of  design  or  active  operation  use  measured  data  as  an  input.  If 
the  problem  requires  N  data  points,  then  the  computational  complexity  of  a 
sr'ution  must  be  at  least  0{N),  since  reading  the  data  into  the  computer  is 
already  an  0(7V)  operation;  this  is  the  best  that  can  be  done.  Conventional 
nr  athematical  operations,  such  as  matrix  multiplication  or  calculation  of 
tre  finite  Fourier  transform,  have  been  superseded  by  “fast”  algorithms 
that  make  their  use  practical.  Thus,  the  complexity  of  the  fast  Fourier 
transform  is  only  0{N\ogN),  compared  with  0{N^)  for  the  conventional 
finite  Fourier  transform. 

We  shall  see  that  wavelet  transform  operations  are  typically  -  0{N)  - 
asymptotically  the  best  that  could  be  hoped  for. 

It  is  possible  to  reduce  computational  cost  without  reducing  compu¬ 
tational  complexity.  If  one  method  produces  much  better  ^proximations 
than  another,  then  fewer  data  points  will  be  required  to  provide  the  de¬ 
sired  sdution  accuracy.  Reduction  in  the  quantity  of  data  is  often  a  more 
important  practical  factor  than  the  abstract  complexity  of  a  calculation. 
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2.2  Information  Gained  from  Measurement 


Immediately^  upon  its  invention,  the  digital  computer  was  recognised  as  an 
entirely  new  kind  of  machine  whose  major  influence  would  be  on  information¬ 
intensive  activities  —  the  life  of  the  mind  —  rather  than  on  energy-intensive 
activities,  which  had  been  the  case  for  all  previous  machine  inventions.  In  a 
prescient  analysis  in  1948  of  the  potential  effect  of  computers  on  the  social 
and  economic  structure,  Norbert  Wiener  ([73],  pp.27-28)  drew  attenticm  to 
the  distinctive  and  unprecedented  role  of  this  machine  at  the  level  of  mind. 

Today,  the  digital  computer  dominates  the  scene  in  psychology  and 
biology  as  the  principal  metaphor  for  the  operation  of  the  brain^  and  some 
physicists  and  computer  scientists  have  argued  that  the  physical  universe 
itself  can  best  be  understood  and  modeled  as  an  enormous  digital  computer 
whose  evolution  represents  the  working  out  of  a  program  the  source  and 
purpose  of  which  remain  unknown. 

Thus  it  is  not  surprising  that  the  word  information  has  taken  on  special 
meanings  in  all  phases  of  everyday  life  as  well  as  in  science  and  philosophy: 
we  speak  of  and  hear  about  information  age,  information  glut,  information 
management,  information  overload,  information  society,  information  tech¬ 
nology,  information  theory,  and  information  workers  without  end.  These 
modifications  of  vocabulary  merely  reflect  the  now  universally  recognized 
truth  that  information  is  a  fundamental  constituent  of  the  world  that  in¬ 
teracts  with  such  traditional  and  immediately  physical  constituents  such  as 
the  mass  and  charge  of  material  bodies,  and  space  and  time  as  the  stage  on 
which  natural  philosophy  plays  [49].  Nevertheless,  philosophers  and  ixien- 
tists  no  less  than  economists  still  have  great  difficulty  in  identifymg  exactly 
what  information  is,  how  it  interacts  with  the  other  constituents  of  physical 
or  societal  reality,  how  to  quantify  it  in  the  national  as  well  as  the  natural 
accounts,  and  even  how  to  recognize  it. 

In  this  section  we  will  examine  the  role  of  information  at  the  fundamen¬ 
tal  level  of  measurement,  which  is  the  interface  between  science  and  experi¬ 
ence,  and  calculation,  which  we  think  of  as  the  interface  between  measure¬ 
ment  and  mind.  We  shall  see  that  a  multiresolution  or  scaled  representation 
is  the  essential  ingredient  for  extracting  information  from  observations. 


*Thi«  aection  u  a  modified  venion  of  [50]. 

*Thi«  iiieti4>hor  baa  been  impUdUy  adopted  by  art  Uetoriane  in  their  aaalyeU  of  the 
role  of  the  peychophydc*  of  vieion.  A  etandard  presentation  is  Gombticfa  [16]. 
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2.2.1  Digital  Computers  and  Measurement 

A  binary  digital  computer  deals  directly  with  dyadic  rational  numbers,  that 
is,  with  numbers  of  the  form 


r 


where  q  is  odd  or  sero  and  n  is  an  integer.  Moreover,  any  operaticm  that 
a  digital  computer  can  perform  can  be  thought  of,  or  encoded  as,  a  finite 
sequence  of  bits  -  of  zeros  and  ones  -  and  hence  as  a  particular  binary 
expansion  of  a  dyadic  rational  number.  Thus  the  entire  realm  of  digital 
computers  is  limited  to  the  ring  D  of  dyadic  rationale.^  Moreover,  every 
reading  produced  by  a  measuring  instrument  is  a  rational  number  and,  if 
the  instrument  is  constrained  so  that  the  bits  in  the  binary  expansion  of 
a  measured  value  can  be  read  off  one  by  on*',  then  the  instrument  only 
produces  dyadic  rationals^  Thus,  direct  observation  of  the  physical  world 
can  only  produce  numbers  of  exactly  the  sort  that  a  digital  computer  can 
use. 

Until  recently,  when  one  spoke  of  using  mathematics  to  model  natural 
phenomena,  one  meant  the  calculus.  The  ideas  that  underlie  the  calcu¬ 
lus  were  systematically  introduced  in  the  late  seventeenth  century  and  the 
principal  properties  of  derivatives  and  integrals  were  obtained  by  means  of 
informal  arguments  that  employed  infinitesimal  quantities.  Leibniz  recog¬ 
nized  that  the  introduction  of  infinitesimals  implied  the  existence  of  ideal 
infinitely  small  and  infinitely  large  numbers  that  share  the  properties  of 
the  ordinary  real  numbers  and,  although  mathematicians  were  unable  to 
justify  the  use  of  these  ideal  numbers  -  indeed,  the  real  numbers  themselves 
were  not  fully  understood  until  the  late  nineteenth  century  -  infinitesimals 
were  freely  used  in  research  and  m  mathematics  education,  as  they  still 
tend  to  be  used  in  engineering  and  other  branches  of  applied  mathematics 

*A  ring  U  a  mathematical  structure  in  whidi  additioo,  subtraction  and  multiplication 
can  be  performed.  If  division  by  every  non-sero  element  is  also  possible,  mathematicians 
call  the  structure  a  field.  The  reader  will  note  that  a  reciprocal  of  a  dyadic  rational  is 
not  in  general  a  dyadic  rational.  This  is  why  the  numbers  known  to  a  oonqmter  do  not 
form  a  field.  A  rational  number  is  one  of  the  form  tn/n  where  m  and  n  are  integers  and 
n  ^  0.  The  set  of  all  rational  numbers,  denoted  Q,  is  the  smallest  infinite  field  and  the 
ring  D  is  contained  within  it. 

^This  argument  is  not  strictly  correct.  The  computer  can  represent  an  integer  by 
its  finite  binary  expansion,  whidi  is  a  dyadic  rational  number.  Therefore,  an  arbitrary 
rational  number  can  be  identified  with  the  ordered  pair  of  dyadic  ratioeial  integers  that 
are  the  numerator  and  denominator  of  the  rational  number  in  reduced  form.  Shnilaily, 
algebraic  numbers  sudi  as  Vs,  Vl7,  etc.,  can  be  coded  by  finite  sequences  of  integers, 
so  they  can  also  be  expressed  exactly  in  a  distal  computer.  A  measuring  apparatus, 
however,  cannot  usually  be  arranged  to  measure  the  numerator  and  denominator  of  a 
rational  nuidber  separatdy. 
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today.  As  Abraham  Robinson  [51]  observed,  ‘The  idea  of  the  infinitesi¬ 
mally  small  seems  to  appeal  to  our  intuition.”  It  was  not  until  1960  that 
infinitesimals  were  put  on  a  sound  logical  basis  by  Robinson’s  introduction 
of  non-standard  analysis,  which  relies  on  the  modern  theory  of  models  in 
mathematical  logic  to  introduce  a  structure  that  extends  the  field  R  of 
real  numbers  to  include  an  infinite  hierarchy  of  infinite  and  infinitesimal 
numbers  in  a  way  that  insures  the  validity  of  first  order  predicates  in  the 
extended  system;  cf.  [51].  This  extension  provides  a  kind  of  multiresolution 
analysis  for  infinite  and  infinitesimal  quantities. 

The  kinds  of  numbers  that  seem  to  be  needed  for  the  theoretical  math¬ 
ematics  of  the  calculus  are  evidently  very  different  from  the  numbers  that 
are  produced  by  direct  observation  of  the  physical  world:  measurements 
are  not  unrestricted  real  numbers  which,  like  the  transcendental  x,  require 
infinitely  many  bits  to  specify  their  binary  expansion,  and  they  certainly 
are  not  one  of  Leibniz’  and  Robinson’s  infinitesimals.  We  appear  to  be 
faced  with  two  unpleasant  alternatives:  either  real  numbers  (and  possibly 
infinitesimals)  occur  in  the  world  but  cannot  be  measured  exactly  (which 
introduces  an  inherent  uncertainty  in  measurement  and  in  all  deductions 
based  on  measurements),  or  measurements  reveal  all  that  there  is  to  know 
(so  that  our  mental  models,  dependent  upon  the  real  numbers  and  perhaps 
even  upon  infinitesimals,  are  overdetermined:  they  contain  far  too  many 
constraints  to  accurately  reflect  the  phenomena). 

These  remarks  recall  Kronecker’s  century-old  criticism  of  the  “com¬ 
pleted  infinite,”  that  nothing  could  be  s«ud  to  have  mathematical  existence 
unless  it  could  actually  be  constructed  in  terms  of  a  finite  number  of  pos¬ 
itive  integers.  In  his  view,  the  rational  numbers  exist  since  they  can  be 
represented  as  the  ratio  of  two  integers  but  transcendental  numbers  like 
T  do  not  exist,  since  they  require  infinitely  many  fractions  or  operations 
for  their  representation.  Kronecker  is  thus  the  mathematician  in  whom 
a  digital  computer  can  believe.  He  proposed  a  program  to  “aritbmetize” 
mathematics  and  eliminate  from  it  all  “non-constructive”  concepts.*  It  is 

*  “And  if  I  can't  do  thi*,“  he  said,  “it  will  be  done  by  those  who  cocne  after  me!”  ([48], 
p.  26).  Indeed,  beginning  in  1907,  L.  E.  J.  Brouwer  put  forward  a  program  of  “con- 
•tructivist  mathematics”  but  he  made  little  headway  among  practicing  matbematidans, 
who  were  ooncerned  that  Brouwer’s  views  would  unnaturally  and  unnecessarily  limit  the 
development  of  mathematics.  More  recently,  Bishc^  continued  the  attempt  to  “purge 
[mathematics]  oonq>letely  of  its  idealistic  content.”  Hear  Bishop  (  [4],  p.2): 

Mathematics  belongs  to  man,  not  to  God.  We  are  not  interested  in  the 
properties  of  the  positive  integen  that  have  no  descriptive  meaning  for 
finite  man.  When  a  man  proves  a  positive  integer  to  exist,  he  should  know 
how  to  find  it.  It  God  has  mathematics  of  his  own  that  needs  to  be  done, 
let  him  do  it  himself. 
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indeed  time  to  re&camine  the  relationship  between  mathematics  and  natu¬ 
ral  philosophy  in  the  new  light  provided  by  the  theory  of  computability  and 
in  terms  of  the  information  that  calculations  and  physical  measurements 
can  actually  supply. 

2.2.2  Uncertainty  and  Natural  Philosophy 

During  the  past  100  years  philosophy  and  science  have  conspired  to  teach 
us  the  linuts  of  reason  and  observation.  The  set-theoretic  paradoxes  of 
Cantor  and  the  logicians,  Einstein's  elimination  of  absolute  time  and  space, 
Heisenberg’s  introduction  of  uncertainty  as  the  indispensable  ingredient 
in  the  microworld,  and  Godel’s  proof  that  mathematics  and  logic  are  not 
strong  enough  to  prove  aQ  that  is  true,  limit  what  can  be  known.  It  may 
seem  paradoxical  that  this  period  of  lowered  expectations  of  what  can  be 
known  has  coincided  with  the  period  of  greatest  expansion  and  precision  of 
what  is  known  in  mathematics  and  science.  Evidently  it  is  better  to  know 
one’s  limits  than  to  thrash  about  in  unfulfillable  expectation. 

It  was  during  this  period  of  intellectual  ferment  that  probability  and 
statistics  entered  scientific  thinking  in  a  fundamental  way.  The  concepts  of 
probability  and  information  first  influenced  each  other  in  the  second  half 
of  the  nineteenth  century  in  the  work  of  Maxwell  and  Boltzmann  on  sta¬ 
tistical  thermodynamics,  where  entropy  gain  was  associated  with  loss  of 
information.  Boltzmann’s  distribution  for  a  thermodynamical  system  in 
equilibrium  defines  an  entropy  measure  which  is  equivalent  to  the  informa¬ 
tion  measure  introduced  by  Shannon  [53]  three-quarters  of  a  century  later 
in  the  context  of  electrical  communication  theory. 

At  the  macroscopic  level  there  are  circumstances  in  which  it  is  possible 
-  in  principle,  at  least  -  to  isolate  the  information  gained  from  a  single 
measurement  without  prior  knowledge  of,  or  reference  to,  probability  dis¬ 
tributions.  Where  quantum  phenomena  are  concerned,  the  outcome  of  a 
measurement  is  conditioned  by  the  initial  state  of  the  system  which,  if  it 
is  not  pure  -  not  an  “eigenstate”  -  will  be  represented  by  a  state  vector  in 
Hilbert  space,  say 

where  the  il>a  ue  a  system  of  eigenstates  and  the  complex  coefficients  Cq 
define  a  probability  distribution  {pa}  by  the  formula 


Pc.  =  Ical’ 


The  number  pa  is  the  probability  that  measurement  of  the  quantum  sys¬ 
tem  will  show  it  to  be  in  the  state  labelled  a.  It  is  the  essence  of  quantum 
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phenomena  that  systems  cannot  long  remain  in  pure  states,  so  measure¬ 
ments  cannot  be  sharp  (von  Neumann  [39]  treats  this  question  in  detail). 
Shannon’s  measure  of  information  associated  with  the  discrete  probability 
distribution  {pa}  is 

a 

where  ‘Ig’  denotes  the  ba8e-2  logarithm  and  information  I  is  measured  in 
bits.  The  minus  sign  is  introduced  to  insure  that  I  is  non-negative.  If  the 
state  of  a  quantum  system  prior  to  an  observation  is 

and  its  state  subsequent  to  measurement  is 


then  the  information  gain  is 

A/  =  /({|C«-p})-7({|c«"-|*}). 


Quantum  mechanics  asserts  that  a  measurement  forces  a  physical  system 
into  a  pure  state,  which  immediately  begins  to  decay  into  a  superposition 
of  states.  Thus  the  “final”  state,  by  which  we  understand  the  pure  state  in 
which  the  physical  system  finds  itself  as  an  inunediate  consequence  of  the 
measurement,  is  one  for  which  exactly  one  of  the  coefficients  c^”**  is  not 
zero,  whence  for  some  state  It  follows  from  the  definition  of 

I  that  =  0.  Therefore  the  gain  in  information  is  measured  by 

the  quantity 

A7  =  /({|cj."i“-«p})  =  . 


which  is  non-negative.  This  shows  that  a  quantum  mechanical  measurement 
will  produce  an  increase  in  information  as  long  as  the  initial  state  is  not 
pure. 

Uncertainty  is  the  part  of  life  in  the  quantum  world  of  which  we  have 
no  longer  any  doubt,  but  it  is  still  not  generally  realized  that  quantum 
uncertainty  is  inherited  from  the  mathematical  model  that  is  used  to  de¬ 
scribe  the  physical  phenomena.  So-called  “conjugate”  physical  quantities, 
such  as  position  and  momentum,  or  time  and  energy,  correspond  to  linear 
functicnak  that  are  Fourier  transform  pairs.  It  is  a  mathematical  fact  that 
a  function  (or  functional)  x  *-*  f{x)  and  its  Fourier  transform  y  f{y) 
satisfy  the  uncertainty  inequality  (cp.  [49],  pp.132-139): 

A.A»  >  i  . 
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Here  the  physical  interpretation  of  x  and  y  will  be  that  of  conjugate  dynam¬ 
ical  variables.  If  a  measurement  increases  information  about  one  quantum 
mechanical  dynamical  variable  by  decreasing  the  uncertainty  concerning  is 
magnitude,  then  the  uncertainty  principle  requires  that  the  measurement 
increase  the  uncertainty  about  the  magnitude  of  the  conjugate  dynamical 
variable  and  thereby  produce  a  corresponding  loss  of  information. 

2.2.3  Axioms  for  Measures  of  Information  Gain 

The  goal  of  a  calculation  is  to  determine  the  value  of  some  number  within 
certain  bounds  of  accuracy.  The  information  gained  from  the  calculation 
is  therefore  measured  by  the  difference  between  the  information  about  the 
number  that  was  known  before  the  calculation  began,  and  the  information 
known  about  the  number  after  the  calculation  is  completed.  Since  all  that 
the  calculation  can  produce  is  some  finite  sequence  of  bits  of  the  binary 
representation  of  the  number,^  the  gain  in  information  must  be  measured 
by  the  new  bits  that  the  calculation  produces.  Indeed,  as  Wiener  [73] 
may  have  been  the  first  to  observe,  the  measure  of  information  gained  is 
just  the  number  of  new  bits  that  the  calculation  produces.  This  raises  an 
important  problem,  for  it  means  that  it  is  impossible  to  know  the  exact 
numerical  value  as  a  “real  number”  of  any  physical  measurement  because 
a  real  number  requires  infinitely  many  bits  for  its  complete  specification. 
Since  the  measurement  of  a  bit  and  the  calculation  of  a  bit  both  require  en¬ 
ergy,  the  only  way  in  which  a  numerical  value  could  be  determined  exactly 
from  observations  would  be  if  the  energy  £„  required  to  observe  the  n***  bit 
in  the  binary  expansion  of  a  number  were  to  decrease  sufficiently  rapidly 
with  increasing  n  that  the  infinite  series  ^  En  (which  represents  the  total 
energy  required  to  observe  the  number)  converged  to  a  finite  value.  But 
the  consequences  of  such  an  assumption  are  contrary  to  experience.  Even 
if  the  observed  numerical  quantity  were  a  rational  number  -  even  if  it  were 
the  number  1  -  there  would  be  no  way  to  verify  this  by  direct  physical  mea¬ 
surement  or  observation  of  the  bits  of  the  number’s  binary  expansion.  This 
is  a  type  of  “uncertainty  principle”  which  seems  to  be  more  fundamental 
and  universal  than  the  principle  of  Heisenberg;  indeed,  the  latter  may  be  a 
consequence  of  it. 

At  this  point  we  must  take  stock  of  where  we  are  and  where  we  want 
to  go.  Our  immediate  goal  is  to  quantify^  the  information  gained  from 
a  calculation,  and  to  understand  how  the  multiresolution  structure  of  the 
number  system  and  the  measurement  process  make  this  possible.  Shannon’s 

*Far  convenicDce,  and  without  k>M  of  gcnenlity,  we  may  euppoee  that  all  numben 
are  repreeented  in  base  2  positional  notation. 
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measure  of  information  will  not  do  for  this  purpose  since  there  is  no  natural 
way  to  associate  a  probability  distribution  with  either  a  calculation  or  a 
limited  number  of  measurements.  But  ad  hoc  introduction  of  a  new  measure 
of  information  will  not  do  either.  So  let  us  return  to  first  principles  and 
attempt  to  axiomatize  the  properties  of  a  reasonable  measure  of  information 
gain,  and  then  derive  the  quantitative  form  that  information  gain  must 
assume.  We  follow  the  discussion  in  [49]. 

A  prototypical  physical  measurement  consists  of  determining  the  length 
of  a  rod  by  aligning  it  alongside  a  ruler.  Let  the  true  length  of  the  rod 
be  denoted  by  /,  and  suppose  that  the  ruler  is  the  real  number  line  sub¬ 
divided  by  marks  in  the  ordinary  way.  The  “multiresolution”  process  of 
measurement  will  produce  not  the  true  length  /  but  some  finite  sequence  of 
bits  that  can  be  interpreted  as  the  initial  (i.e.  most  significant)  bits  in  the 
binary  expansion  of  some  real  number.  The  finite  number  of  bits  produced 
by  the  measurement  exactly  define  the  binary  expansion  of  some  dyadic 
rational  number.  Successive  refinement  of  the  measurement  (with  the  aid 
of  a  magnifying  lens,  perhaps)  will  lead  to  increasingly  accurate  dyadic  ra¬ 
tional  approximations  to  /,  but  we  cannot  infer  anything  at  all  about  the 
as  yet  unmeasured  bits.  This  process  of  successive  refinement  should  be 
thought  of  as  selection  of  successively  smaller  intervals  of  the  real  line  that 
contdn  the  number  /.  This  is  the  multiresolution  step;  A  measurement  of 
the  length  of  the  rod  produces  a  pair  of  (dyadic  rational)  numbers  xi  and 
yi  that  are  the  coordinates  of  the  interval  containing  /,  such  that 

<  /  <  yi  , 

and  a  second  measurement  that  refines  the  first  produces  a  pair  of  numbers 
T2  and  y2  such  that 

z\  <Z2<1  <y2<y\  ■ 

Our  task  is  to  determine  this  gain  in  information  provided  by  the  pair  of 
measurements. 

Since  we  are  assuming  that  the  gain  in  information  depends  only  on 
the  measured  intervals  in  which  I  is  cont^ed,  we  may  express  the  gain  in 
information  as  a  function  of  the  endpoint  coordinates  of  the  pair  of  intervals: 

A/  =  A/(xi,yi,Z2,y2)  • 

The  coordinates  of  the  endpoints  of  the  measurement  intervals  depend  on 
the  unit  of  measurement  of  the  ruler  (the  numbers  Xi,yi,Z2,y2  will  be 
different  if  centimeters  or  inches  are  marked)  and  on  where  the  zero-point 
of  the  ruler  is  placed  alongside  the  rod,  but  the  gain  in  information  is  surely 
independent  of  both.  This  implies  that  the  function  A/  that  measures 
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the  gain  in  information  must  be  invariant  under  the  transformations  that 
correspond  to  scale  changes,  i.e. 


1 1-»  at 

where  a  ji  0,  and  under  transformations  that  correspond  to  a  rigid  motion 
which  shifts  the  position  of  the  zero  point  of  the  scale  without  chan^ng  the 
distance  between  any  pair  of  marks,  i.e. 

The  most  general  combination  of  these  two  types  of  transformation  has  the 
form 

i  at  *4“  ^ 

with  a  ^  0  and  b  an  arbitrary  re»]  number.  Let  us  write  this  transformation 
as  t  H-+  ja,b(t)-  The  collection  of  these  transformations  constitutes  the  affine 
group,  and  our  first  axiom  states  that  the  function  A7  that  measures  the 
information  gain  is  invariant  -  i.e.  unchanged  -  by  the  action  of  affine 
transformations: 


Axiom  1:  For  any  dyadic  rational  numbers  w,i,y,z  and  any  affine  trans¬ 
formation  7,  the  gain  in  information  satisfies  the  equation 

7(*). 7(y). 7(-’'))  =•  A/(u;, ar, y,  r)  . 


Suppose  that  scientists  in  two  laboratories  are  racing  to  be  the  first  to 
measure  the  length  I  of  the  unicorn’s  horn.  Both  laboratories  find  that 
*1  <  /  <  yi-  Proceeding  with  caution.  Laboratory  A  then  finds 

Zi  <  X2  <  I  <  y2  <  Vi  ■ 

Emboldened  by  this  result.  Laboratory  A  improves  its  measurement  tech¬ 
nique  and  obtains  the  refinement 

X2  <  X3  <  I  <  ys  <  y2  . 

Laboratory  B,  off  to  a  late  start,  throws  caution  to  the  winds  and,  in  one 
astonishing  experiment,  finds  the  estimate 

Xi  <  xa  <  I  <  ys  <  yi  . 

What  is  the  relationship  of  the  information  gains  reported  by  Laboratory 
A  to  the  information  gain  realized  by  Laboratory  B? 
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Clearly,  the  gain  in  information  obtuned  by  Laboratory  A  in  its  first 
experiment  is 

A/j»(a:i,yi,at2,y2)  , 

and  in  the  second, 

A/A(x2,y2,ar3,y3)  , 

for  a  total  gain  of 

A/x(xi,yi,x2.y2)  + A7^(x2.y2i*3.y3)  • 

The  gain  reported  by  Laboratory  B  is 

A/A(*i,yi,*3,y3)  • 

Were  the  the  total  gain  in  information  reported  by  Laboratory  A  greater 
than  the  gain  reported  by  Laboratory  B,  then  investigations  that  proceed 
by  a  sequence  of  small  incremental  improvements  would  be  preferred,  but  if 
the  information  gain  reported  by  Laboratory  B  were  greater,  then  a  research 
strategy  of  fewer  but  more  significant  experiments  would  be  desirable.  Ex¬ 
perience  supports  reason  when  it  tells  us  that  both  procedures  produce  the 
same  gain  in  information.  This  additivity  of  information  is  the  content  of 
our  second  axiom; 


Axiom  2: 

A/(xi,yi,X3,y3)  =  A/(ii,yi,X2.!/2)  + A7(z2,y2,*3.lft)  • 
These  two  axioms  imply  that  the  gain  in  information 


A7(xi,yi,X2,y2) 

is  a  function  of  the  mtio  (i.e.  not  of  the  variables  separately) 

-  Vi 

*2-1/2  ’ 

that  is, 

A7(xi,yi,X2,y2)  =  9  {— — 

V*2  -ViJ 

for  some  as  yet  unknown  function  g,  and  that  g  satisfies  the  relation 


g{uv)  =  g{u)  -b  y(t;) 


Good  Appraximatioas 


45 


for  positive  numbers  u  and  v. 

It  is  not  bard  to  show  that  g  is  the  logarithm  function. If  the  unit  of 
information  is  the  bit,  then 


A/(zi,|/i,X2,y5)  =lg(^|i — . 

\*2  -  1/2/ 

One  important  consequence  of  this  formula  is  that  more  than  one  measure¬ 
ment  must  be  made  in  order  to  gain  information.  A  single  measurement 
lacks  a  standard  scale  against  which  information  gain  can  be  evaluated.^' 
The  essential  nature  of  a  standard  of  comparison  has  been  intuitively 
understood  for  a  long  time.  Circa  1630  Constantyn  Huygens,  father  of  the 
scientist  and  secretary  to  the  first  stadholder  of  the  Dutch  Republic,  wrote 
in  his  Autobiography  (See  [3],  p.l8  ): 

...  the  estimation  which  we  commonly  make  of  things  is  vari¬ 
able,  untrustworthy,  and  fatuous  insofar  as  we  believe  that  we 
can  eliminate  every  comparison  and  can  discern  any  great  dif¬ 
ference  in  size  merely  by  the  evidence  of  our  senses.  Let  us  in 
short  be  aware  that  it  is  impossible  to  call  anything  “little”  or 
“large”  except  by  comparison.  And  then,  as  a  result,  let  us 
firmly  establish  the  proposition  that  the  multiplying  of  bodies 
...  is  infinite;  once  we  accept  this  as  a  fundamental  rule  then 
no  body,  even  the  most  minute,  may  be  so  greatly  magnified 
by  lenses  without  there  being  reason  to  assert  that  it  can  be 
magnified  more  by  other  lenses,  and  then  by  still  others,  and  so 
on  endlessly. 

Leone  Battista  Alberti  ([3],  p.22),  writing  circa  1450  on  painting  and  sculp¬ 
ture,  had  already  acknowledged  the  relativism  of  measurement  : 

[Properties]. ..which  philosophers  term  “accidents”  because  they 
may  or  may  not  be  present  in  things,  -  all  these  are  such  as  to 
be  known  only  by  comparison. 

and  continued  with  the  perceptive  gloss  that 

**’ReUUvdy  weak  technical  mathematical  aiaumptiaiu,  eudi  ae  continuity  of  g,  ate 
required  to  prove  this  result.  Without  any  additional  assumptions  one  can  prove  that 
g  is  comidetdy  detennined  by  the  numbers  {g{pk)  :  1  <  k  <  oo),  where  the  pa  run 
through  the  prime  numbers,  and  where  the  values  g{pk)  can  be  chosen  arbitrarily.  Thus 
S  it  a  kind  of  “generalized”  logarithm. 

**[49]  shows  how  Shannon’s  formula  for  information  spears  at  the  memure  of  the 
sversf  e  gain  in  mformatiaa  produced  by  a  collection  of  measurements. 
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As  man  is  best  known  of  all  things  to  man,  perh^s  Protagoras, 
in  saying  that  man  is  the  scale  and  measure  of  all  things,  meant 
that  accidents  in  all  things  are  duly  compared  to  and  known  by 
the  accidents  in  man. 

Thus  far  we  have  considered  the  information  gained  from  an  observa¬ 
tion  or  a  measurement.  It  has  been  implicitly  assumed  that  something  "out 
there”  is  being  observed  or  measured,  so  theories  and  models  of  external 
reality  play  some  role  in  what  can  and  cannot  be  measured.  An  analysis 
of  the  information  gained  by  mental  activity  leads  to  similar  conclusions. 
By  far  the  simplest  and  most  clear-cut  case  concerns  mathematical  mental 
activity,  which  is  precisely  defined  and  easily  described.  When  the  informa¬ 
tion  gained  from  a  calculation  is  considered,  it  makes  no  difference  whether 
the  calculation  is  performed  entirely  in  a  mind,  or  is  assisted  by  paper  and 
pencil,  or  is  actually  performed  by  a  machine.  This  is  the  bridge  between 
philosophical  considerations  of  the  relationship  of  information  to  measure¬ 
ment,  and  the  theory  of  compactly  supported  wavelets,  which  is  a  method 
for  extracting  information  by  means  of  mathematical  calculation. 


Chapter  3 

Wavelets:  a  Positional 
Notation  for  Functions 


3.1  Multiresolution  Representation 

Muhireaolution  representation  is  a  new  term  for  a  very  old  idea.  It  describes 
what  is  also  called  a  hierarchical  structure.  Hierarchical  structures  organize 
information  into  categories  called  levels  and  usually  arrange  it  so  that  the 
higher  in  the  hierarchy  a  level  is,  the  fewer  the  number  of  members  it  has. 
Hierarchies  are  familiar  in  social  and  political  organizations:  a  country  has 
many  states;  a  state  has  many  counties;  a  county  has  many  towns.  The 
italicized  words  are  names  for  leveb  in  one  way  of  organizing  political  sub¬ 
divisions  in  a  hierarchy.  A  hierarchical  or  multiresolution  structure  provides 
different  ways  of  grouping  things  to  reveal  aspects  of  structure  that  depend 
on  the  scale  of  activity.  Recalling  the  discussion  of  the  previous  section,  we 
see  that  a  multiresolution  structure  provides  the  necessary  mechanism  for 
extracting  information  from  a  sequence  of  measurements  or  calculations. 

In  the  biological  realm,  the  human  vision  system  employs  several  mul¬ 
tiresolution  structures.  One  design  objective  of  the  vision  system  is  to  pro¬ 
vide  wide  aperture  detection  (so  events  can  be  detected  early)  and  high  res¬ 
olution  detection  (so  that  the  detailed  structure  of  the  visual  event  can  be 
seen).  Since  the  vision  system  has  limited  bandwidth,  the  Uncertainty  Prin¬ 
ciple  tells  us  that  these  objectives  are  fundamentally  incnnpatible.  Nature 
has  evolved  a  multiresolution  solution  which  allocates  the  limited  available 
bandwidth  in  two  parts:  the  bulk  of  the  retinal  receptors  are  arranged  as 
a  wide  aperture  but  low  acuity  sensor  (the  ‘principal  part”),  while  a  snnall 
fraction  of  the  sensors  form  the  fovea,  which  has  much  higher  resolution  but 
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Figure  3-1:  The  basic  step  of  a  multiresolution  analysis 

Principal  Part 


Input 


Residual  Part 


a  narrow  aperture  (the  “residual  part”).  This  system  provides  multireso¬ 
lution  information  because,  after  detecting  peripheral  motion,  we  iam  ovr 
gaze  to  see  the  details.  The  vision  system  trades  time  for  bandwidth.  This 
is  a  classic  engineering  trade-off  that  must  be  made  in  many  problems.  As 
we  shall  see,  wavelet  mathematics  systematizes  the  trade-off  process. 

Mass  production  is  another  example  of  a  multiresolution  or  hierarchi¬ 
cal  structure  that  increases  efficiency  by  partitioning  a  task  into  principal 
and  residual  parts,  and  repeating  that  process  until  only  simple  tasks  re¬ 
main.  Hub-and-spoke  airline  route  structures  illustrate  yet  another  way 
multiresolution  analysis  can  provide  functional  economies. 

The  concept  of  muUiresobtiion  analysis  underlies  the  theory  of  wavelets. 
The  idea  is  simple  and  ancient:  separate  the  information  to  be  analyzed 
into  a  “principal”  part  and  a  “residual”  part.  In  applications  to  signal 
processing  the  residual  part  should  be  thought  of  as  primarily  “high  pass” 
and  the  principal  part  as  primarily  “low  pass.”'  For  other  applications  the 
interpretation  of  ‘principal  part”  and  “residual  part”  will  be  different. 

The  process  of  decomposition  can  be  applied  again  to  one  or  both  of 
the  parts.  If  it  is  repeatedly  applied  to  the  low  pass  part,  this  process  is 
exactly  the  one  introduced  by  S.  Mallat  [34]  in  1987  to  <^culate  the  wavelet 
expansion  of  a  sequence  of  discrete  numbers  (cp.  figure  3-1). 

*The  rcMon  for  this  identiiicstiaa  n  that  there  are  more  ‘^lish  frequency"  etatee  than 
"low  frequency”  etatee,  eo  reduction  of  con^lexity  amounte  to  edection  of  a  euitable 
"low  paee”  principal  part. 


Wavelets:  A  Positional  Notation  tar  f\iiictjons 
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3.2  The  Democratization  of  Arithmetic: 
Positional  Notation  for  Numbers 

The  most  important  example  of  a  multiresolution  analysis  is  positional  no¬ 
tation  for  numbers.  Positional  notation,  first  introduced  by  Babylonian 
mathematicians  about  4000  years  ago  for  base  60,  is  probably  the  most 
important  single  discovery  in  mathematics.  Before  the  invention  of  posi¬ 
tional  notation,  the  computational  complexity  of  arithmetic  was  too  great 
for  a  part-time  “amateur”  to  master;  arithmetic  was  the  province  of  trained 
specialists.  Positional  notation  democratised  arithmetic. 

Positional  notation  made  arithmetic  easy;  anyone  could  learn  it.  Po¬ 
sitional  notation  facilitated  other  mathematical  discoveries,  including  cal¬ 
culus,  and  thereby  ultimately  made  science  and  engineering  and  finance 
possible.  Think  of  using  Roman  numerab  to  balance  a  checkbook  or  design 
a  bridge!  Positional  notation  also  makes  the  telephone  system  practical, 
because  without  it  telephone  numbers  and  the  switching  circuits  that  use 
them  would  be  too  complicated  and  costly  to  support  a  large  number  of 
subscribers.  In  fact,  the  largest  numbers  that  people  freely  use  in  daily  life 
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are  telephone  numbers.^ 

Ck>nsider  “base  2”  positional  notation  from  the  perspective  of  multires¬ 
olution  analysis.  The  set  of  integers  Z  can  be  separated  into  subsets  of  even 
integers  and  odd  integers.  Each  positive  integer  n  can  be  written  in  the 
form 

n  =  2t  To, 

where  r  is  the  remainder  after  division  by  2  and  2k  >  0.  We  shall  say  that 
2k  is  the  principal,  or  low  pass,  part  of  n  and  that  r  is  the  residual,  or  high 
pass,  part.  The  value  of  the  residual  part  (0  or  1)  determines  whether  n  is 
even  or  odd.  Repetition  of  this  procedure  on  the  successive  low  pass  outputs 
produces  a  sequence  of  high  pass  residuals^  rjv,  ■  ■  • ,  ri ,  rg  that  correspond 
to  the  formula 

n  =  5^ri2‘.  (3-1) 

i>0 

The  sequence  of  bits  rn  •  •  -rirg  is  the  binary  expansion  of  n. 

The  powers  of  2  that  appear  in  eq(3-l)  can  be  thought  of  as  basis  vectors 
labeUed  by  2*. 

Anticipating  later  needs,  we  will  introduce  a  function  to  label  the  basis 
elements;  consider 

ip(z)  := 

^An  internstiafial  telephone,  eudi  M  011-1-617-577-1700,  ie  about  aa  large  as  the  gross 
domestic  product  of  the  United  States  ($1,116,175,771,700  S  tl.l  x  10*^  ~  fl  trillion). 

*We  list  the  residuals  from  right  to  kft. 
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It  is  a  trivial  fact  that  ip  satisfies  the  sealing  equation* 

<p{x)  =  2(pi2z). 

Now  label  the  basis  element  2*  by  writing  using  the  formula  2*  =  ^(1/2*). 
Successive  basis  vectors  {^(2*)}  are  chained  together  by  the  scaling  equar 
tion 

<p{l/V+^)  =  2^(2(l/2‘+»))  =  2<p(l/T),  (3-2) 

which  is  a  complicated  way  of  writing  2‘'*’*  =2-2*,  but  a  way  which  puts 
in  evidence  the  essential  property  of  the  basis  vectors,  which  is  that  they 
establish  a  muHiresolution  representation  for  a  number.  In  terms  of  these 
basis  vectors,  a  general  expansion  with  integer  coefficients  takes  the  form 

r  =  ^niip{l/r), 

t=0 

and  the  sum  is  a  dyadic  rational  number  r.  This  series  is  in  normal  form 
if  the  coefficients  are  0  or  1.  The  “scaling  equation”  eq.  (3-2)  is  a 
prescription  for  reducing  the  general  series  to  its  normal  form,  and  the 
normal  form  is  exactly  the  binary  expansion  of  n.  The  reduction  works 
this  way.  Consider  an  arbitrary  expansion  n  =  5I<=o”»y*(V2*)-  Since 
V>(l/2°)  :=  2°  =  1,  we  can  write  no  =  no^(l).  Separate  no  into  its  principal 
and  residual  parts: 

no  —  2ko  *!■  *’0- 

and  use  the  scaling  equation  to  find 

noVc»(l)  =  (2fco  +  »’o)^(l) 

=  2ko<f>(l)  -f  ro^(l) 

=  ko<p{l/2)  +  roy>(l). 

Combine  terms 

ioV>(l/2)  +  niv»(l/2)  =  (to  +  ni)y?(l/2), 

and  apply  the  scaling  equation  again.  Repetition  of  this  process  produces 
the  normal  form  for  the  series,  i.e.  the  binary  expansion  of  n. 

The  reader  should  think  of  wavelet  analysis  as  doing  for  the  represen¬ 
tation  of  functions  what  positional  notation  does  for  the  representation 
numbers.  The  wavelet  series  represents  a  function  in  a  way  that  displays 
the  information  content  that  each  term  of  the  series  provides.  This  is  the 
basic  reason  that  the  simple  idea  of  wavelets  is  producing  a  r^>idly  growing 
circle  of  practical  consequences. 

*Id  thiaform,  «c  will  be  able  to  tee  the  reUsioarilip  between  binary  positiaiialiioUtioa 
and  ‘Vank  3”  wavelets  introduced  in  the  next  diapter. 
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3.3  Music  Notation  as  a  Metaphor  for  Wavelet 

Series 

A  musicial  instrument  maker  uses  the  physical  properties  of  materials  to 
build  instruments  that  shape  sound.  Implicit  in  noaking  and  playing  an 
instrument  is  a  psychoacoustic  model  of  hearing  that  describes  -  in  mathe¬ 
matical  terms  -  how  people  perceive  sound,  and  an  implicit  cognitive  model 
of  beauty  and  meaning  that  governs  which  sounds  are  worth  hearing. 

Music  notation  is  a  kind  of  graph  for  representing  a  simplified  version 
of  the  acoustic  waveform  that  causes  us  to  hear  music.  The  horizontal  axis 
of  the  graph  represents  time.  The  vertical  axis  is  arranged  like  a  piano 
keyboard  -  equal  distances  on  this  axis  correspond  to  equal  changes  in  the 
apparent  pitch  of  a  tone,  not  to  equal  changes  of  the  physical  frequency  of 
vibration  of  the  resonating  instrument  that  created  it. 

In  classical  western  music  the  unit  of  measure  of  pitch  is  the  octave. 
Pure  tones  spaced  an  octave  apart  have  frequencies  whose  ratio  is  2:1. 
This  measuring  scheme  reflects  properties  of  the  ear,  not  of  the  instrument. 
The  audible  frequency  range  is  partitioned  into  octaves,  and  each  octave  is 
further  divided  into  whole  tones  and  semitones.  Thus  the  octave  defines  a 
scale  in  terms  of  which  the  entire  range  of  frequencies  is  partitioned. 

This  scaled  system  is  a  multiresolution  analysis  of  the  pitch  of  tones 
whose  units  are  naturally  related  to  the  human  ear  that  senses  sound.  In 
this  system,  the  scale  in  which  the  pitch  of  a  tone  is  measured  is  proportional 
to  the  logarithm  of  the  frequency  of  the  tone.  The  factor  of  proportionality 
is  not  fixed;  different  orchestras  prefer  factors  that  are  similar  but  not 
identical. 

Musical  notes  placed  on  the  score  have  a  pitch  coordinate  and  a  time  co¬ 
ordinate.  The  pitch  coordinate  specifies  the  fundamental  tone  to  be  played 
by  the  musician.  The  time  coordinate  specifies  when  the  musician  should 
start  to  play  the  tone.  The  shape  the  note  symbol  indicates  how  long  it 
should  be  played.  The  score  does  not  specify  bow  loud  each  played  note 
should  be,  but  the  composer  may  provide  rough  guidance  here  and  there. 
The  composer  may  also  indicate  the  instrument  which  should  be  used  to 
play  the  note;  this  detemines  the  >'armonic  structure  of  overtones  that  wiU 
be  heard. 


3.4  Wavelet  Phase  Space 

Each  of  these  features  of  music  notation  has  its  counterpart  in  the  wavelet 
representation  of  a  function.  When  a  function  of  time  is  represented  by 
wavelets,  the  coefficients  of  the  wavelet  series  correspond  to  the  notes. 


tional  Notation  far  functions 


I;  Diagram  of  wavelet  phase  space  with  keyboard. 
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Each  coefficient  cj^k  is  indexed  by  two  integers.  The  term  in  the  wavelet 
series  with  this  coefficient  is  where  il>j,k{x)  is  the  wavelet  basis 

function.^.  The  numerical  value  of  the  coefficient  has  a  physical  meaning: 
it  is  the  amplitude  of  the  signal,  and  like  the  amplitude  of  a  tone,  |c,-,tp 
represents  the  energy  carried  by  the  basis  function  il>jk{x).  The  integer  j  is 
caJled  the  scale  or  level;  like  musical  pitch,  it  is  not  a  frequency  but  more 
like  the  logarithm  of  a  frequency.  The  musical  octave  corresponds  to  the 
multiplier  2  of  the  wavelet  multiresolution  analysis. 

The  index  k  describes  the  time  interval  during  which  the  energy  rep¬ 
resented  by  the  coefficient  acts.  The  ratio  k/2^  is  the  onset  of  the  action, 
and  the  effect  of  the  energy  represented  by  the  coefficient  lasts  for  a  time 
proportional  to  1/2^  units*. 

The  wavelet  representation  contains  all  the  information  necessary  to 
reconstruct  the  sound  waveform. 


^The  wavelet  function  tie  defined  in  Chapter  4 

*The  conitant  ie  proportional  to  the  length  of  the  rapport  of  wavelet  acaling  function, 
which  dependa  on  the  choice  of  the  wavelet  baaU,  juat  as  the  reaonance  propeitiea  of  a 
tone  depend  on  the  inatnunent  that  made  it. 
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Figure  3-6:  Wavelet  phase  space. 
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(b)  Phase-^»ceiq)fesentation  of  local  energy;  Square  of  wavelet  transfonn 
coefficients  for  a  refined  Mallat  tree  and  the  wavelet  matrix  DIO. 
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Figure  3-7:  Phase  space  representation  of  an  impulse  function;  Energy  for 
D3  wavelet  transform. 
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Figure  3-8:  Modulus  of  the  Fourier  transform  of  the  scaling  function  for 
the  non-separable  2-dimensional  Novon  Haar  wavelet  matrix. 


58 


An  Essay  on  Wavelet  Technology 


Aware,  Inc.  AD921018 


Part  II 


Theory 


Chapter  4 


Wavelet  Theory 


4.1  Comparison  with  the  Fourier  Transform 


Three  types  of  Fourier  transform  appear  in  scientific  applications:  the  con¬ 
tinuous,  the  discrete  and  the  finite.  The  types  are  classified  according  to 
whether  the  values  of  the  Fourier  transform  are  continuous,  discrete,  or 
finite  variables.  These  classes  correspond  to  a  theoretical  continuous  model 
of  a  process;  to  an  infinite  sequence  of  idealized  discrete  measurements;  and 
to  the  case  where  only  a  finite  number  of  measurements  are  available,  which 
is  the  one  that  actually  occurs. 

Corresponding  to  erM:h  type  of  Fourier  transform  is  a  type  of  wavelet 
transform.  In  this  section  the  types  of  Fourier  transform  and  the  corre¬ 
sponding  types  of  wavelet  transform  are  compared  to  motivate  the  mathe¬ 
matical  definitions  in  the  next  section. 


Table  4.1:  Three  types  of  Fourier  Transform 

Type _ Domain  Range 

~R  R 

R  Z 

Z/(m)  Z/(m) 


Continuous 

Discrete 

Finite 
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4.1.1  The  Continuous  l^ansform 

The  continuous  Fourier  transform  is  a  unitary  transformation  of  the  space 
L^(R)  of  signals  with  finite  energy.*  The  continuous  Fourier  transform  is^ 

/(y)  :=  r  (4-1) 

and  the  inverse  transform  is 

f{x)  =  r  e’«»*/(y)dy  •  (4-2) 

•/— 00 

The  Fourier  transform  is  a  function  of  the  continuous  variable  p;  if  x  is 
interpreted  as  time,  then  y  is  interpreted  as  frequency. 

Corresponding  to  the  continuous  Fourier  transform  is  the  confinaotts 
wavelet  transform 

H^(/)(«.  v)  :=  1“  (^y(x)dx  ,  (4-3) 

with  inverse 


The  analyzing  wavelet  V*  plays  the  role  of  the  exponential  function;  for  each 
choice  of  analyzing  wavelet  there  is  a  distinct  continuous  wavelet  transform. 

The  wavelet  transform  is  a  function  of  two  variables.  1/v  is  a  variable 
analogous  to  frequency,  and  the  ratio  u/v  has  the  same  dimensions  as  x; 
thus,  the  wavelet  transform  is  a  function  of  position  in  the  time-frequency 
“phase  space.” 

However  important  it  is  for  theoretical  purposes,  the  continuous  Fourier 
tranform  is  never  used  in  practical  applications  for  two  simple  reasons:  ob¬ 
served  signals  are  only  known  for  intervals  of  finite  duration,  and  the  For  rier 
integral  defined  on  the  entire  real  axis  cannot  be  efficiently  calculates.  For 
the  same  reasons,  the  continuous  wavelet  transform  is  of  theoretical  but  not 
practical  importance.  For  the  analyzing  wavelet  kernel  il>,  it  is  even  more 
difficult  and  costly  to  compute  the  double  integral  in  the  wavelet  transform. 


*That  i«,  conqilex  valued  functioua  of  a  real  variable  that  are  aquare  iategrable  with 
req>ect  to  Lebeague  measure. 

^This  formula  aaeumea  that  /  ia  integr^le  with  respect  to  Lebesgue  measure.  FUno- 
tions  that  are  not  integrable  are  also  important. 
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4.1.2  The  Discrete  IVansform 


The  discrete  Fourier  transform  is  the  relati<»iship  between  a  periodic  func¬ 
tion  /  €  1])  and  its  Fourier  series  coefficients,  viz., 

c„:=  /‘e->'‘»*/(*)d*.  (4-5) 

Jo 

Its  inverse  is  the  Fourier  series  expansion 


/(*)= 

neZ 


(4-6) 


The  discrete  Fourier  transform  is  the  functi(»i  c  :  Z  — »  C.  lYom  it,  one 
can  construct  the  short  time  Fourier  transform,  which  is  the  collection  of 
Fourier  series  expansions  of  a  function  /  :  R  — »  C  successively  restricted  to 
the  intervals  [n,  n  1],  where  n  €  Z. 

Let  m  >  1  be  an  integer.  The  discrete  wavelet  transform  is  defined  by 
the  formula 


V’(m»*  —  k)f(x)dz. 


(4-7) 


where  j,k  &Z.  The  inverse  transform  is  the  wavelet  series 


/(*) = E  E  —  *)  •  (4-8) 

jcZsgZ 

The  wavelet  series  coefficients  Cj^t,  are  the  values  of  the  discrete  function 
(ji  *)  Cj.t  where  c  :  Z  x  Z  — *  C.  The  wavelet  function  V"  satisfies  certain 
conditions  that  will  be  described  in  section  4.2. 


4.1.3  The  Finite  lYansform 

Let  m  be  a  positive  integer  and  denote  by  Z/(m)  the  set  {0, 1, ...  ,m  —  1}. 
The  finite  Fourier  transform  is  the  unitary  linear  transformation 

it  := -i  exp(-2xtl;//m)v(  (4-9) 

mcZ/(m) 

that  converts  the  m-dimensional  vector  v  €  C”*  into  the  vector  v  €  C”*. 
Its  inverse  is  the  finite  Fourier  series  expansion 

exp  (2xil//m)  w;  . 

m€Z/(m) 


Vk  ;= 


(4-10) 
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Table  4.2:  Three  types  of  Wavelet  Transform 

Type _ DomMn  Range _ 

Continuous  R  R  x  R"*" 

Discrete  R  Z  x  Z 

Finite _ Z _ Z  x  Z/(m) 


The  vectors  v  and  v  should  be  thought  of  as  functions  v  :  Z/(m)  — »  C  and 
i  :  Z/(m)  — *  C  defined  on  the  finite  set  Z/(m). 

Corresponding  to  the  finite  Fourier  transform  is  the  finite  wavelet  trans¬ 
form  of  an  arbitrary  sequence  f  ih*-*  C,  defined  by  ® 

n€Z 

with  inverse 

/(«)=  E  E«*+n-  (^12) 

0<r<m  tg2I 

In  these  formulae  m  is  an  integer  greater  than  1  and  a  =  (a|^)  is  a  wavelet 
matrix  of  rank  m.  Wavelet  matrices  and  their  rank  are  defined  in  the  next 
section. 

4.2  Mathematical  Definition 

4.2.1  The  Basic  Ingredients 

In  1988  Ingrid  Daubechies,  a  Belgian  mathematician  working  at  the  Courant 
Institute  of  New  York  University,  discovered  a  far-reaching  generalization 
of  the  compactly  supported  orthonormal  basis  introduced  by  Alfred  Haar 
in  1910  [19].  Daubechies’  generalization  -  the  compactly  supported  scal¬ 
ing  functions  and  wavelets  -  are  a  new  class  of  functions  with  remarkable 
advantages  for  analysis  and  practical  computation. 

The  simplest  Daubechies  scaling  function  and  its  associated  wavelet  are 
illustrated  in  figures  4-1  and  4-2.  Each  of  these  unusual  functions,  whose 
serrated  graphs  suggest  a  connection  with  fractal  curves,  has  support  of 
length  3  units;  outside  that  interval,  each  function  is  zero.  Both  functions 
are  continuous,  but  neither  is  differentiable. 


*We  will  generalise  to  negative  m  in  Chapter  ??. 


Wavelet  Theory 


65 


Figure  4-1:  The  D2  scaling  function  ip  (m  =  2,g  =  3). 


From  these  functions  an  orthonormal  basis  for  all  signals  that  have  finite 
energy  or  are  the  sum  of  a  finite  energy  signal  and  a  signal  that  has  locally 
finite  energy  can  be  built  by  the  simple  construction  described  in  section  4. 

The  fundamental  building  block  of  wavelet  theory  is  the  vmvelei  matrix. 
A  wavelet  matrix  is  an  m  x  array  of  real  or  complex  numbers,  where 
m  >  2  and  y  >  1;  m  is  called  the  rank  of  the  wavelet  matrix  and  g  is  called 
its  genus.  The  wavelet  matrix  can  be  thought  of  as  a  rectangular  array 
which  consists  of  g  square  arrays;  each  square  array  is  an  m  x  m  matrix.  If 
g  is  finite  the  rows  of  the  wavelet  matrix  are  finite  dimensional  vectors. 

The  rows  of  a  wavelet  matrix  and  the  infinite  collection  of  vectors  formed 
from  them  by  shifting  each  row  a  multiple  of  m  places  either  right  or  left 
forms  an  orthonormal  basis  that  can  be  used  to  express  any  discrete  func¬ 
tion.  This  expression  is  the  wavelet  matrix  series  for  the  discrete  function. 

Using  the  wavelet  matrix,  a  basis  ccxisisting  of  wavelet  functions  can 
be  built  for  the  space  of  square  integrable  functions  on  the  real  line  - 
i.e.,  functions  that  represent  signals  having  finite  energy.  The  relationship 
between  the  wavelet  matrix  series  for  functions  of  a  discrete  variable  n  C  Z 
and  the  wavelet  function  series  for  functions  of  a  real  variable  z  €  R  is  the 
key  to  the  success  of  wavelets  in  mediating  between  physical  models  of  a 
process  and  the  discrete  sets  of  numbers  that  come  from  measurements. 

Each  row  of  the  wavelet  matrix  can  be  thought  of  as  a  vector  whose 
components  are  the  tap  values  of  a  finite  impulse  response  (TIR”)  digital 
filter;  this  is  the  source  of  the  connection  between  wavelets  and  digital  signal 
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Figure  4-2:  The  D2  ivavelet  \(>  (m  =  2,  jr  =  3). 


processing. 

4.2.2  Ways  to  Increase  Spectral  Resolution 

Suppose  that  a  signal  is  measured  at  equally  spaced  intervals  at  a  rate  of 
m  samples  per  unit  time.^  It  is  often  important  to  represent  the  campled 
signal  in  terms  of  its  spectral  content.  For  a  function,  this  is  accomplished 
by  the  Fourier  trwsform,  which  converts  time  information  to  frequency 
information. 

The  theory  of  wavelets  reconciles  two  fundamentally  different  approaches 
to  increasing  the  spectral  resolution  of  a  digital  Alter.  Conventiond  digital 
signal  processing  employs  an  m-band  single  phase  perfect  reconstruction 
Alter  represented  by  a  unitary  m  x  m  matrix  of  numbers  to  “block  trans¬ 
form”  digital  input  data.  The  numbers  in  each  row  of  the  matrix  are  the 
values  of  the  taps  of  a  correlator.  The  block  transform  trades  the  time 
resolution  of  the  input  signal  for  spectral  resolution  (relative  to  the  par¬ 
ticular  polyphase  Alter)  of  the  output.  The  larger  m  is,  the  greater  is  the 
time  resolution  of  the  input,  and  the  greater  the  spectral  resolution  of  the 
transform  output.  Typically,  the  polyphase  Alter  is  successively  ^plied 
to  non  overlapping  sequences  of  m  input  data  points,  so  that  the  result 
is  a  succession  of  “block  transforms”  of  the  input  datastream.  Examples 


the  limit  m  — .  oo  we  ehall  think  of  thi*  a*  oontinuou*  eampling. 
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of  such  single  phase  block  transforms  are  the  Finite  Fourier  IVansform, 
the  Discrete  Cosine  TVansform,  the  Hadamard  Transform,  the  Chebyshev 
'Dansform,  etc. 

Wavelet  digital  signal  processing  employs  an  m  x  mg  perfect  reconstruc¬ 
tion  filter  where  >  1  and  the  matrix  is  ‘^araunitary.”  The  wavelet  matrix 
filter  is  applied  to  sequences  of  mg  input  data  points,  which  are  successively 
shifted  through  the  filter,  m  points  at  a  time.  If  jr  >  1,  the  result  is  a  suc¬ 
cession  of  “overlapping  transforms”  of  the  data.  Since  the  correlators  that 
correspond  to  each  row  of  the  wavelet  matrix  have  mg  tiq>s,  these  filters 
can  be  very  long  and  their  spectral  resolution  can  be  correspondingly  fine. 
Since  the  wavelet  filters  are  shifted  through  the  data  by  only  m  data  points 
at  a  time,  time  resolution  is  not  entirely  lost. 

Each  of  the  m  outputs  of  a  wavelet  matrix  filter  can  be  employed  as 
the  input  to  another  wavelet  matrix  filter  which  further  refines  the  spec¬ 
tral  resolution  of  its  input.  Repetition  of  this  construction  produces  a  tree 
of  wavelet  matrix  filters  that  can  be  tailored  to  particular  requirements. 
The  filters  can  be  different  and  the  tree  need  not  be  complete.  The  re¬ 
sult  of  applying  the  tree  of  filters  to  a  stream  of  data  is  a  multiresolution 
representation  for  the  data. 

When  the  wavelet  matrices  are  arranged  in  an  array  according  to  their 
rank  m  (equal  to  the  number  of  output  bands)  and  genus  g  (equal  to  the 
length  of  the  correlator  filters  measured  in  multiples  of  m  taps)  an  ar¬ 
rangement  like  Table  4.3  results.  The  conventional  block  transforms  are 
arranged  in  the  first  column,  with  spectral  resolution  increasing  with  m. 
The  wavelets  discovered  by  Daubechies  [9]  (corresponding  to  wavelet  matri¬ 
ces  for  which  m  =  2),  are  arranged  in  the  first  row,  their  spectral  resolution 
increasing  with  g.  The  remainder  of  the  array  is  largely  terra  incognita  but 
a  few  special  cases  have  been  previously  investigated.  The  general  theory 
of  wavelets  provides  a  systematic  approach  to  investigating  this  territory, 
and  principles  for  employing  orthonormal  overlapped  perfect  reconstruction 
filters  as  primitive  elements  in  the  design  of  very  large  filters. 

The  theory  of  wavelets  is  also  a  mathematical  theory  that  establishes 
a  new  class  of  relationships  between  representations  for  discrete  functions, 
i.e.  functions  such  as  digitally  sampled  signals  defined  on  a  lattice  (say 
Z)  and  functions  defined  on  the  real  line  R.  The  discrete  part  of  the  the¬ 
ory  is  essentially  algebraic  -  this  is  the  theory  of  wavelet  matrices.  The 
“continuous”  part  of  the  theory  (by  which  we  mean  the  theory  of  wavelets 
for  functions  defined  on  continue  like  R)  is  essentially  function-theoretic. 
The  relationship  between  the  two  provides  a  sturdy  bridge  between  scien¬ 
tific  models  for  physical  phenomena  and  the  necessarily  discrete  processes 
of  measurement  and  computation  that  underlie  engineering  ?naly8is  and 
design. 
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4.2.3  Rank  m  Wavelet  Matrices 


Let  m  and  g  be  positive  integers  with  m  >  1.  A  wavelet  matrix  of  rant  m 
and  genua  y  is  an  m  x  mg  matrix  of  complex  numbers, 


/  oo 


-mg 


-1  \ 


a  := 


V 

\  ^0 


1  / 


which  satisfies  the  wavelet  scaling  conditions 

53a*+m/'at+m/  = 


and 


k 


(4-13) 


(4-14) 


(4-15) 


The  set  of  all  wavelet  matrices  of  rank  m  and  genus  g  with  matrix  entries 
drawn  from  a  ring  or  field  T  is  denoted  WM(m,  <?;/■).  T  will  be  omitted 
when  it  is  evident  from  the  context. 

The  quadratic  conditions  eq(4-14)  assert  that  the  rows  of  the  wavelet 
matrix,  when  considered  as  vectors,  have  the  common  length  y/m\  that 
distinct  row  vectors  are  orthogonal;  and  that  distinct  row  vectors  remain 
orthogonal  when  shifted  relative  to  each  other  by  multiples  of  m  elements. 
This  last  property  is  responsible  for  the  overlap  of  the  support  of  the  wavelet 
basis  functions  when  ^  >  1.^ 

The  linear  conditions  eq(4-14)  distinguish  the  first  row  of  a;  it  is  called 
the  “scaling  vector.”  In  a  digital  signal  processing  context,  it  is  also  called 
the  “low-pass”  filter  because,  when  the  rows  of  a  are  employed  as  taps 
for  m  correlator  filters,  the  dc  part  of  a  digital  signal  is  contained  in  the 
output  of  the  low-pass  correlator.  The  remaining  m  —  1  rows  are  said  to 
be  “high-pass.”  If  the  linear  conditions  are  not  satisfied,  then  dc  is  passed 
through  some  linear  combination  the  corresponding  correlator  filters  and 
the  filter  is  “paraunitary.” 

The  matrix  (STtoJ+mt),  0  <  r,s  <  m  is  a  wavelet  matrix  of  genus 
p  =  1;  it  is  called  the  characteristic  Eaar  wavelet  matrix  associated  with 
a.  Figure  4.2.3  associates  wavelet  matrices  organized  by  rank  and  genus 
with  the  types  of  problems  that  they  can  be  used  to  solve.  We  shall  have 
more  to  say  about  the  relationship  between  problem  classes  and  the  rank 
and  genus  parameters  when  we  study  wavelet  applications  in  Part  III. 

*The  condition  is  vacuous  when  s  =  1. 
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Table  4.3:  Conventional  digital  signal  processing  transforms  within  the 
framework  of  the  arbitrary  rank  wavelet  theory  (m=rank,  j;=genus  of 
wavelet  transform  and  Dj;=Daubechies’  wavelets  of  genus  g). 


9  =  1 

2 

3 

9 

.  .  . 

m  =  2 

Haar 

D2 

D3 

.  .  . 

D9 

.  .  . 

8 

WM(8,1) 

WM(8,ff) 

8-pt  DCT 

8-pt  FFT 

8-pt  Hadamard 

8-pt  Chebyshev 

6-pt  Slant 

* 

; 

m 

WM(m,  1) 

; 

* 

OO 

WM(oo,l) 

WM(oo,p) 

r 
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Figure  4-3;  Typical  transform  application  domains  within  framework  of 
the  arbitrary  rank  wavelet  theory  (m  =  rank  and  y  =  genus  of  wavelet 
transform. 
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Wavelet  matrices  and  wavelet  bases  can  be  defined  for  multidimensional 
signals;  cp.  Lawton  and  Resnikoff  [31]. 


4.3  Examples  of  Wavelet  Matrices  for  m  =  2 

Rank  2  wavelet  matrices  are  the  most  important  special  class.  For  this  case 
an  abbreviated  and  specialized  notation  is  useful.  If 

is  a  rank  2  wavelet  matrix  of  genus  g,  then  the  first  row  of  a  is  said  to  be  a 
scaling  vector.  A  scaling  vector  can  always  be  completed  to  a  unique  wavelet 
matrix  whose  second  row  is  determined  by  the  choice  of  characteristic  Haar 
wavelet  matrix.  The  formula 


it  =  a^s-i-k 

corresponds  to  the  Haar  wavelet  matrix 


This  is  the  selection  that  we  will  make  in  what  follows. 


4.3.1  The  sine  Wavelet  Matrix  (m  =  2,  =  cx>) 

The  sine  wavelet  matrix  of  rank  m  =  2  is  an  example  of  a  non  compact 
wavelet  matrix,  that  is,  a  wavelet  matrix  for  which  g  =  oo.  It  b  defined  by 

a  line  :=  (ot) 


where 


021 


_o 

02i+l 

<*21 

<*2t+l 


f  I  if  4  =  0 
\  0  else 

2  (-1)* 

X  24  +  1  ’ 

2(-l)*-^ 

X  24-1  ’ 
f  -1  if  4  =  0 
[  0  else. 
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Gregory’s  formula 

z  =  ^ 

implies  that  a  ^nc  satishes  the  linear  wavelet  conditions, 

X(^  ainc  )  ~ 

The  orthonormality  conditions  are  a  consequence  of  Euler’s  formula 


4.3.2  Real  Wavelet  Matrices  for  (m  =  2,  g  =  2)  and 
(m  =  2,g  =  3) 

A  real  valued  wavelet  matrix  of  rank  m  =  2  and  genus  9  <  3  is  determined 
by  its  scaling  vector  (the  low  pass  row)  and  by  which  of  the  two  possible 
real  characteristic  Haar  wavelet  matrices  of  rank  two  is  selected.  The  six 
coefficients  of  the  scaling  vector  can  be  expressed  in  terms  of  two  continuous 
parameters  a  and  0,  each  of  which  varies  in  the  interval  [0,2s’).  After 
taking  into  account  identifications  -  i.e.,  parameter  pairs  that  correspond 
to  the  same  scaling  vector,  it  is  seen  that  the  parameter  space  can  be 
identified  with  a  pinched  two  dimensional  real  torus.  Aware’s  logo  (see 
figure  1)  is  a  square  that  can  be  rolled  up  to  become  a  torus  which,  when 
the  diagonal  from  lower  left  to  upper  right  is  “pinched”  to  a  point,  becomes 
the  parameter  space. 

The  Haar  wavelet  matrix  is  the  only  wavelet  matrix  of  genus  9  =  1;  it 
corresponds  to  the  pinched  point.  For  wavelet  matrices  of  genus  g  =  2  or 
^  =  3  the  equations  that  express  the  coefficients  in  terms  of  the  parameters 
are: 


aj  =  1/2(1  +  cos  o); 

Og  =  1/2(1  +  •s/^ sin  or); 

p  =  1/201 -a5)*  +  (l-o«)2; 

og  =  1/2(1  -  o5)  +  p  cos 
flj  =  1/2(1  -  03)  +  psin 
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a?  =  1  -  03  -  oj: 

0°  =  1  -  Oq  -  Oj 


4.4  Orthonormal  Bases  of  Wavelet  Matri¬ 
ces. 


4.4.1  Wavelet  Expansion  of  Discrete  Functions 

l,ei  f  :  )-»  C  be  a  complex  valued  discrete  function,  and  let  a  be  a 

wavelet  matrix  of  rank  m  and  genus  g  <  oo.  Then  /  has  the  wavelet  matrix 
series 


^  (to)  ~  ^i°'mk+n  • 

0<r<mtgZ 

whose  coefficients  are  given  by  the  formula 

toS^(to)^‘+"‘ 


(4-17) 


(4-18) 


In  these  formulae  the  summations  are  finite  sums  that  can  be  calculated  in 
a  finite  number  of  steps,  so  they  are  suited  to  practical  needs. 

If  the  sampling  rate  were  1  sample  per  unit  interval,  so  /  :>  c,  the 
formulae  would  become: 


/(”)  -  I3r  12k 


(4-19) 


Every  discrete  function  can  be  expanded  in  a  wavelet  matrix  series.  This 
shows  that  wavelet  matrix  series  are  more  general  than  the  finite  Fourier 
transform,  which  requires  the  function  to  be  periodic,  or  Fourier  series  and 
wavelet  series,  which  demand  a  “finite  energy”  restriction  to  insure  that  / 
does  not  grow  too  rapidly  for  the  sums  to  converge.  The  sums  defining  the 
coefficients  of  a  wavelet  matrix  series  are  finite  so  they  always  exist. 

The  Shannon  sampling  theorem,  well  known  to  communications  engi¬ 
neers,  is  a  formula  that  interpolates  a  function  whose  values  are  known  for 
uniformly  spaced  sampling  points.  If  the  sampling  points  are  labelled  by 
the  integers  and  {/(n)  :  n  €  Z}  is  given,  then  Shannon’s  fcvmula  states 
that 

/(*)=  /(*)«“<:»(*-*) 

ft€Z 
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if  the  function  /  is  "band  limited”  to  the  interval  [—1/2, 1/2],  i.e.  if  the 
Fourier  transform  of  /  is  supported  on  [—1/2, 1/2]. 

It  turns  out  that  sincx  is  a  scaling  function  that  corresponds  to  a 
wavelet  matrix  whose  genus  is  jr  =  oo,  i.e.,  a  wavelet  matrix  that  has 
infinitely  many  columns.  If  the  values  /(n)  are  considered  as  a  discrete 
function,  then  /  has  a  wavelet  matrix  expansion  with  respect  to  the  sine 
wavelet  matrix. 

Higher  Rank  Daubechies>like  Wavelet  Matrices 

Pollen  [40]  and  Pollen  and  Linden  [41]  derived  the  scaling  vector  for  the 
rank  m  real  wavelet  matrices  for  genera  g  =  2  and  p  =  3  for  which  the 
space  spanned  by  the  translates  of  the  scaling  function  contains  an  arbi¬ 
trary  polynomial  of  degree  g  —  I,  just  as  Daubechies’  scaling  functions  D2 
and  D3  do  for  m  =  2.  Later  Heller  [23]  determined  the  complete  wavelet 
matrices  for  these  cases,  including  their  dependence  on  the  choice  of  the 
characteristic  Haar  wavelet  matrix. 

For  riuik  m  and  genus  g  =  2  the  components  of  the  smooth  scaling 
vector  are 

*  “0  +  ^.  0<s<m, 

“m+,  =  1-oJ,  0<s<m, 

where 


The  high  pass  elements  of  the  wavelet  matrix  are  naturally  more  com¬ 
plicated.  We  find 

aj  =  +  0<r<m,  0<s<m, 

<+.  =  Hl-ar„  0<r<m,  0  <  s  <  m, 

where  H  is  the  characteristic  Haar  wavelet  matrix  of  a  and 


It  is  easy  to  verify  that  the  support  of  a  rank  m  genus  g  scaling  function 
and  its  associated  fundamental  wavelet  is  dense  in  the  interval  [0,y  -I- 
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Figure  4-4;  The  Daubechies  scaling  function  of  genus  2  for  m  =  2  and  the 
analogous  scaling  function  of  genus  2  for  m  =  5, 8. 


for  jr  =  2  the  support  of  these  associated  functions  has  length 
2-»-l/(m-l). 

Figure  4-4  illustrates  the  scaling  functions  for  ranks  2,3,  and  8.  It  sug¬ 
gests  that  these  scaling  functions  have  a  limit  as  m  oo.  In  the  next 
section  we  shall  see  that  the  limit  is  a  piecewise  linear  function  supported 
on  [0,2]  which  gives  rise  to  a  new  spline-like  infinite  rank  orthonormal 
wavelet  basis. 

4.4.2  Infinite  Rank  Smooth  Wavelet  Matrices 

The  limit  of  infinite  rank  makes  sense,  and  leads  to  new  orthonormal  bases 
of  functions  in  I'^(R)  that  are  piecewise  polynomial  of  degree  9  —  1  if  the 
underlying  wavelet  matrix  is  smooth  of  tank  g  ~  1.  If  9  =  2,  the  scaling 
vector  and  the  infinitely  many  fundamental  wavelet  vectors  are  supported 
on  intervads  of  length  2,  and  the  basis  is  obtained  by  integer  translations 
of  the  scaling  function  and  fundamental  wavelets;  for  m  =  00  there  are  no 
“wavelets”  of  smaller  support;  the  expansion  of  a  function  is  not  a  mul¬ 
tiresolution  analysis. 

The  infinite  rank  limit  of  the  quadratic  wavelet  matrix  constraint  eq(4- 
14)  is  the  orthogonality  condition 

Jr\r  +  x)a'il  +  x)dx  =  6'’’'6,j, 

where 

a*-(/  +  x)  :=  ^^1™ 

The  zeroth  moment  constraint  eq(4-15)  takes  the  form 
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The  wavelet  matrix  series  for  an  arbitrary  discrete  functions  passes  over 
into  the  formula 

=  (4-20) 

r  k 

and  the  expansion  coefficients  are  calculated  from 

cj  =  J '^(k  +  x)f{x)dx.  (4-21) 

These  “waveletized  Fourier  series”  formulae  generalize  the  classierd  Fourier 
series  expansion  for  a  periodic  function. 


4.5  Orthonormal  Bases  of  Compactly  Sup¬ 
ported  Wavelets. 

4.5.1  The  Wavelet  Basis 

Once  a  wavelet  matrix  of  finite  genus  is  given,  a  compactly  supported 
orthonormal  wavelet  basis  can  be  constructed.*  Let  a  denote  a  complex 
wavelet  matrix  with  rank  m  and  genus  g.  With  the  aid  of  a,  a  scaling  Unc¬ 
tion  and  m  —  1  fundamental  wavelets  are  defined  by  the  system  of  equations 

=  S  “I  -  k) ,  (4-22) 

k 

where  0  <  r  <  m.  Notice  that  only  the  scaling  function  appears  on  the 
right  side  of  these  equations.  The  m  —  1  fundamental  wavelets  are  explicitly 
defined  in  terms  of  the  scaling  function  by  the  equations  for  0  <  r  <  m, 
whereas  the  scaling  function  is  implicitly  defined  by  the  recursive  equation 
corresponding  to  r  =  0.  The  scaling  function  is  to  be  thought  of  as  a  “low- 
pass”  function  while  the  fundamental  wavelets  are  “high-pass”  functions. 

Even  in  the  simplest  case  these  equations  also  have  solutions  whose 
support  is  not  compact.  For  instance,  the  simplest  scaling  recursion  is 


V>(i)  =  y>(2i)  +  ^(2i  -  1), 


where  m  =  2  and  y  =  1.  It  has  the  standard  Saar  scaling  function  solution, 


f  1  for  0  <  *  <  1, 
(  0  otherwise 


*Exceptiaaal  caaa  occur;  cp.  [29]. 
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which  is  unique  on  the  open  interval  (0, 1),  but  this  simple  functional  equa^ 
tion  also  has  uncountably  many  other  solutions  that  differ  from  each  other 
in  nontrivial  ways.  For  instance,  figure  &-4  illustrates  a  solution  whose  sup¬ 
port  is  R,  and  the  figure  5-5  zooms  in  on  the  central  part  of  the  graph. 
This  strange  solution  is  square  integrable,  piecewise  linear,  and  continuous 
except  at  X  =  0  and  x  =  1. 

Hereafter  we  shall  assume  that  we  are  dealing  with  a  compactly  sup>- 
ported  solution.  Define  a  collection  of  auxiliary  functions  by  the  fwmula 

¥>jk(*)  :=  -  k)  ,  (4-23) 

where  j,  k  €Z.  The  support  of  ^Jjt(x)  has  length  equal  to  the  length  of  the 
support  of  (p°{x)  divided  by  m>\  the  quantity  j  is  called  the  scale  or  level 

The  main  result  of  the  theory  is 
Theorem. 

is  an  orthonormal  basis  for  f<’(R)  and 

{v>o.t(*)  :  *  €  Z}  [J  {¥>;t(x)  :  0  <  r  <  m,  ;j,k£Z  ,0<j} 

is  an  orthonormal  basis  for  a  function  space  that  contains  the  space  spanned 
by  I>’(R)  and  the  constant  function  x  *->  1. 

4.5.2  Representation  of  Functions  by  Wavelet  Series 

The  wavelet  series  for  /  is 

I  0<r<mj>0  k 

where 

Cj,k  =  J 

for  all  r,j,  k. 

If 

/(x)  =  /(x  +  l) 

is  a  periodic  function,  then  its  wavelet  expansion  is 

/(*)  =  53  +  E  E  • 

icZ  iiOktZ 
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The  wavelet  coefficients  satisfy 

Cj  —  Cq  ,  ^  —  c^ 

for  all  /  €  Z. 

4.5.3  Daubechies  Wavelets 

Daubechies  Wavelet  Matrices 

Daubechies  [9]  showed  that  there  are  wavelet  bases  of  rank  m  =  2  and 
genus  g  such  that  any  polynomial  p(x)  of  degree  at  most  g  —  1  has  a  wavelet 
expansion  of  the  form 

leZ 

p{x)  lies  in  the  space  spanned  by  the  integer  translates  of  the  “low  pass” 
scaling  function.  We  call  these  bases,  and  the  corresponding  wavelet  matri¬ 
ces,  Daubechies  bases  and  Daubechies  wavelet  matrices,  and  use  the  notation 
Dg  for  a  Daubechies  wavelet  matrix  of  genus  g.  We  shall  also  say  that  the 
wavelet  matrix  is  smooth  of  degree  g—1.  There  are  exactly  2*  Daubechies 
wavelet  matrices  of  genus  g  which  differ  inessentially  from  one  another. 

The  Daubechies  wavelet  matrices  are  determined  by  either  of  two  equiv¬ 
alent  conditions.  One  of  the  conditions  asserts  that  polynomials  of  degree 
less  than  g  have  low  pass  expansions;  the  other  condition  asserts  that  dis¬ 
crete  polynomials  of  degree  less  than  g  evaluated  on  a  lattice  (Z  for  example) 
have  low  pass  wavelet  matrix  series  expansions. 

The  first  condition  is  equivalent  to  requiring  that  the  high  pass  moments 
of  the  fundamental  wavelets  vanish,  that  is, 

Mom„(y»’')  ;=  J  =  0 

for  0  <  r  <  m  and  Q<n<  g. 

The  second  condition  is  equivalent  to  requiring  that  the  discrete  mo¬ 
ments  of  the  high  pass  rows  of  the  wavelet  matrix  vanish,  that  is, 

Mom„(a’')  :=  ^  =  0,  0<r<m,  0<n<j. 

k 

It  is  easy  to  prove  that  these  sets  of  conditions  are  equivalent.  This  re¬ 
sult  follows  from  a  formula  that  enables  the  continuous  moments  and  the 
discrete  moments  to  be  expressed  in  terms  of  each  other: 

Mom„(¥>’')  =  i  S  (  j  )  Mom,(o'’)Mom„_^(^’‘). 

"*  jsO  '  ' 
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If  these  conditions  are  satisfied,  then  any  polynomial  of  degree  less  than  g 
has  a  low  pass  wavelet  series  expansion,  i.e.  an  expansion  in  which  only 
the  scaling  function  appears.  In  this  case,  there  is  a  natural  relationship 
between  the  discrete  representation  and  the  continuous  representation  [24]. 
The  reader  will  note  that  the  linear  condition  eq(4-15)  simply  asserts  that 
constwts  have  a  low  pass  expansion. 

An  explicit  formula  for  Daubechies  wavelet  matrices  is  known  for  g  = 
2,3,4  and  5.  For  g  =  2  the  formula  is 


iM  +  v^  Z  +  ^/Z  Z-y/5  l-v/3\ 

4\l-y/Z  -Z  +  VZ  Z+VZ  -l-y/Z  ) 


(4-24) 


For  =  3  let  u  =  VTO  and  v  =  \/5  +  2\/l0.  Then  the  scaling  vector  for 
DZ  is 

(1  -f-  u 5  +  u  +  3i;,  10  —  2u -f  2t;, 

Id 


(10  —  2u  —  2w,5 -I- «  —  3v,  1 -I- «  —  v) .  (4-25) 


4.5.4  Complex  Wavelet  Matrices  and  Wavelets 

Wavelet  matrices  whose  entries  are  complex  numbers  have  additional  de¬ 
grees  of  freedom  which  can  be  useful  for  certain  applications.  For  instance, 
although  real  wavelet  matrices  always  lead  to  scaling  functions  that  are 
not  symmetric,^  complex  wavelet  matrices  can  produce  scaling  functions 
whose  real  and  imaginary  parts  are  symmetric,  and  wavelets  whose  real 
and  imaginary  parts  are  skew  symmetric. 

An  example  is  provided  by  the  rank  m  =  2,  genus  g  =  Z  complex  wavelet 
matrix 


J_/-3-iVl5  h-iVih  30-l-2»Vl5 
32V-3-iVl5  -5-nVl5  30-l-2iVl5 


30  +  2iVl5  5-iVl5  -3-iVl5'\ 
-30-2tVl5  5-tVl5  3-nVl5  ) 


(4-26) 


found  by  Lawton  and  Resnikoff.  It  yields  a  complex  orthonormal  wavelet 
basis  for  which  the  space  spanned  by  the  scaling  function  and  its  integer 
translates  contains  all  polynomials  of  degree  less  than  or  equal  to  2,  and 
the  real  and  imaginary  parts  of  the  scaling  function,  illustrated  in  figure 
4-5  are  symmetric  (the  taller  of  the  two  graphs  is  the  real  part). 


^In  digital  filter  terme,  the  low  paae  row  ia  never  a  fc'aear  pk»»t  filter. 
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Figure  4-5:  A  complex  orthogonal  scaling  function  (p  (rn  =  2,  g  =  3). 


4.5.5  Wavelet  Matrix  Pairs  and  Biorthogonal  Wavelets 

The  definition  of  wavelet  matrix  a  can  be  relaxed  by  permitting  the  two 
occurrences  of  a  in  the  bilinear  form  eq  (4-14)  to  be  different.  A  pair  of 
matrices  {L,  R)  isa  wavekt  matrix  pair  of  rank  m  and  genus  g  il  L  and  R 
axe  m  X  mg  matrices  that  satisfy 

(4-27) 

k 

L  (resp.  R)  is  the  left  (resp.  right)  matrix  of  the  pair.  Real  wavelet 
matrix  pairs  were  introduced  in  conjunction  with  so-called  “biorthogonal 
wavelets”;  cp.  [7]. 

An  important  special  case  occurs  if  the  elements  of  a  wavelet  matrix 
pair  are  complex,  for  then  the  left  matrix  and  the  right  matrix  can  be 
complex  conjugates.  An  interesting  example  for  genus  9  =  3  is  Heller’s 
wavelet  matrix  pair.  Introduce  the  abbreviations 


u  :=  VlO,  1;  :=  -  2v/l0, 

and  define 

, _ _1_/  1  —  u-hw  5  —  u-f3v  10-(-2u-|-2v 

16  \  10  -b  2u  -f  2v  5  —  u  -f-  3v  1  —  u  -f  r 

1  —  u  -b  V  — 5  -f  u  —  3v  10  -b  2u  -1-  2t>  \ 
— 10  — 2u  — 2i;  5  — u-b3«;  — l-fu  — v  J' 


(4-28) 


R  =  I. 


(4-29) 
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Figure  4-6:  Heller’s  complex  biorthogonal  scaling  function  (Tn  =  2,g  =  3). 


This  wavelet  matrix  pair  yields  a  complex  biorthogonal  wavelet  basis  for 
which  the  space  spanned  by  the  scaling  function  associated  with  either  L 
or  R  and  its  integer  translates  contains  all  polynomials  of  degree  less  than 
or  equal  to  2,  and  the  real,  resp.  imaginary,  part  of  each  scaling  function 
is  symmetric,  resp.  skew  symmetric.  The  graph  of  the  scaling  function  <pl 
for  L  is  displayed  in  figure  4-6. 


Fourier  Transform  of  Wavelets 

1-Dimensional  Scaling  Functions 

/(y)  :=  r  e»'‘«"/(x)dr 
y— 00 

denote  the  fourier  transform  of  /.  If  a  is  a  wavelet  matrix  of  rank  m,  and 
if  is  integrable,  then 

^(y)  =  n  { ^  exp(2x«*y/m")|  ^(0). 


4.6 

4.6.1 

Let 


4.6.2  2-Dimensional  Scaling  Functions 

The  Fourier  transform  of  the  2-dimensional  scaling  function  that  satisfies 

the  scaling  recursion 


^  oaV’Cmx  -  A) 

A6A 


82 


An  Essay  on  Wavelet  Tscbnology  Aware,  Inc.  AD921018 


Figure  4-7:  Zeros  of  the  Fourier  transform  of  the  “Rectangle”  Eaar  scaling 
function. 


has  a  similar  expression  as  an  infinite  product . 

<p°{w)  =  n  1  ^  exp(2jriRe  (A/V))|  ^(0), 

n=0  L  XtA  J 

where  A  is  a  lattice.  Figures  4-7,  4-8,  and  4-9  display  the  seros  of  the  2- 
dimensional  Fourier  transform  of  the  Haar  scaling  function  for  the  complex 
multiplier  p  =  i>/5,  p  =  1  1  and  p  =  (1  +  iy/l)/2  respectively,  on  the 

complex  plane.  In  each  case,  all  ceroe  are  real,  and  the  transform  variables 
(frequency)  range  from  -15  to  15.  While  the  case  p  =  i^/2  reminds  one  of 
the  zeros  of  sine  (xz)  sine  (xp),  and  the  case  p  =  1  -f  i  is  more  compli¬ 
cated  because  of  the  rotations  by  multiples  of  x/4  radians,  the  third  case  is 
fundamentally  different  because  an  infinite  group  of  rotations  actos  on  the 
zero  drives. 
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Figure  4-8:  Zeros  of  the  Fourier  transform  of  the  ‘^windragon”  Haar  scal¬ 
ing  function. 


Figure  4-9:  Zeros  of  the  Fourier  transform  of  the  “Novon”  Haar  scaling 
function. 
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4.7  Calculus  of  Wavelets 

4.7.1  Exact  Calculation  of  Wavelet  Function  Values 

Calculation  for  m-ary  rational  numbers 

Suppose  that  a  is  a  wavelet  matrix  of  rank  m  and  genus  g.  The  scaling 
recursion  is  the  system  of  equations 

=  51  a*  -  *).  (4-30) 

How  can  the  functions  (p'‘{x)  be  calculated?  Define 

*;(r)  :=¥)-(j  +  *),  0<x<l  (4-31) 

and 

(4-32) 

where  0  <  r  <  m  and  e  6  {0, 1, . . . ,  m  - 1}.  The  system  of  scaling  recursion 
equations  is  equivalent  to 

(4-33) 

If  X  =  O.xi  ...xj  is  the  base  m  expansion  of  z,  i.e.  x  =  Yll-i  then  the 
previous  equation  is  the  same  as 

*>(0  x1*2  ...*/)  =  i4’'(*i)*°(0.X2 . . . xj).  (4-34) 

Repetition  of  this  formula  yields 

*;(0.XiX2...*7)  =  >4'(*,)A^(x2)...A^(xj)*J(0).  (4-35) 

For  Arbitrary  Rational  Arguments 

It  is  obvious  that  eq  (4-35)  provides  an  exact  formula  for  ip'’(x)  whenever 
X  is  a  rational  number  whose  denominator  is  a  power  of  m  and  the  value 
of  ip°{x)  is  known  for  integer  values  of  x.  Latto  and  Pollen  independently 
observed  that  the  construction  in  the  previous  section  leads  to  a  unique 
definition  for  the  value  of  (p°(x)  whenever  x  is  any  rational  number.  Here 
is  how  it  works:  The  base  m  expansion  of  a  number  x  ultimately  repeats 
some  sequence  of  digits  if  and  only  if  x  is  ration^d.  Suppose  that 


*  =  n.*i . . .  xjri . . .  rj(ri . . .  rx  •  • .  n  . . .  r/f . . . , 
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where  ri . . .  rx  is  the  repeating  sequence,  and  introduce  the  abbreviation 

A  =  j4(ri)...^(rx). 

Then 

. . .  xjri . . .  rx  . . .)  =  . . .  rx  •  •  •) 

reduces  the  problem  to  computing  ♦°(0.ri . . .  rx  -  •  .)•  But  ♦°(0.ri . . .  rx  . . .) 
satisfies 

♦?(0.ri . . .  rx  .  - =  j44°(0.ri . . .  rx  • .  .)• 

This  equation  shows  that  4°(0.ri  ...rx---)  is  an  eigenvector  of  A  with 
eigenvalue  I,  so  that  is  uniquely  determined  up  to  a  normalization 

factor  by  the  scaling  recursion  if  x  is  rational  and  the  1-eigenspace  of  A  is 
one-dimensional.  The  normalization  factor  can  be  specified  by  the  require¬ 
ment  that^„  ^°(n  -1-  O.ri . . .  rx )  =  which  is  a  discrete  version  of  the 
formula  <p°{x)dx  =  1. 


4.7.2  Integration 

Definite  integrals  of  scaling  functions  and  wavelets  can  be  exactly  evaluated 
if  the  limits  of  integration  have  finite  expansions  in  base  m,  where  m,  as 
usual,  is  the  rank.  Equally  remarkable  is  that  a  sequence  of  increasingly 
fine  finite  approximations  (which  are  special  cases  of  the  sums  that  define 
the  Riemann  integral)  are  all  equal  to  each  other  and  to  the  infinite  limit 
which  is  the  integral  itself.  This  means  that  the  integrals  can  be  exactly 
calculated  from  the  finite  approximations,  and  justifies  the  last  formula  in 
the  previous  section.  This  property  is  one  of  the  important  links  between 
discrete  analysis  and  continuous  analysis  that  wavelets  provide. 


4.7.3  Formal  Derivatives 

Compactly  supported  wavelets  of  genus  greater  than  1  cannot  be  infinitely 
differentiable.^  This  is  one  of  the  ways  they  differ  from  ordinary  functions 
such  as  polynomials  or  the  elementary  transcendental  functions.  For  in¬ 
stance,  D2,  the  simplest  Daubechies  scaling  functbn,  is  not  differentiable 
at  any  of  its  points. 

Consider  the  scaling  function  ^(x)  for  a  wavelet  matrix  o  of  rank  m  =  2, 
which  satisfies  the  recursion 

V>(*)  =  53  -  *)■  (4-36) 

*The  Haar  wavelet  u  apecial.  It  ie  infiiiitdy  diflerentiabie  everywhere  with  the  excep¬ 
tion  of  two  poinU- 
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We  have  already  seen  that  a  necessary  and  sufficient  condition  that  eq(4-36) 
have  a  solution  for  dyadic  rational  z  is  that 

>1(0)*(0)  =  A*(0).  (4-37) 

If  were  differentiable  of  order  s  then,  denoting  the  s-th  derivative  by 

VJ<*>(z)  =  2*  53  -  *)•  (4-38) 

If  eq  (4-38)  is  valid,  then  ip  is  said  to  have  a  formal  derivative  of  order 


s  =  -lgA. 


This  is  equivalent  to 


if  A  is  an  eigenvalue  of  eq(4-36). 

It  follows  that  the  eigenvector  $(*^(0)  exists  if  and  only  if  A  =  2“*  is  an 
eigenvalue  of  j4(0). 

Formal  differentiability  is  a  necessary  condition  that  the  scaling  function 
have  an  ordinary  derivative  of  the  prescribed  order.  Now  work  backward^: 
the  matrix  v4(0)  has  exactly  2g  eigenvalues  (counting  multiplicity  of  eigen¬ 
values);  hence,  the  scaling  function  can  have  at  most  2g  —  I  derivatives 
(since  one  of  the  eigenvalues  is  1,  corresponding  the  the  zero  derivative,  i.e. 
to  <p{x)  itself).  Not  all  of  the  possible  orders  of  differentiation  are  integers; 
fractional  derivatives  and  complex  derivatives  can  occur. 


Example.  The  eigenvalues  of  i4(0)  for  Daubechies’  wavelet  matrix  D2 
are 

1,  1/2,  (H-v^)/4,  (l->/3)/4, 

and  the  corresponding  orders  of  formal  differentiability  are 

0, 1,  -  lg(l  ^/3)/4  =  0.5500 . . . ,  -  lg(l  -  •v/3)/4  S  2.44998 ...  -  i4.53236 . . . 

The  formal  derivatives  <p^*\x)  are  displayed  in  figure  4-10  through  4-12. 
The  scaling  function  itself  is  the  zero  derivative.  Although  the  scaling 
function  is  not  differentiable  in  the  classical  sense,  it  does  have  a  well  defined 
formal  first  derivative. 


*At  every  n<io»e)  if  i4(0)  and  A(l)  have  oommoo  eigenvector  with  deaired  eigenvalue. 
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Figure  4-10:  Formal  derivative  of  order  1  of  the  D2  scaling  function  (rn=2, 
y=2). 


Example.  The  formal  derivatives  <p^'\x)  of  the  scaling  function  for  the 
wavelet  matrix  DZ  are  displayed  in  figure  4-13.  The  scaling  function  itself  is 
the  zero  derivative.  The  scaling  function  is  not  difierentiable  in  the  classical 
sense,  although  it  does  have  both  formal  first  and  formal  second  derivatives. 

4.7.4  Connection  Coefficients  for  Numerical  Differen¬ 
tiation 

Term-by-term  differentiation  of  a  Fourier  series  or  a  power  series  is  easy, 
because  the  derivative  of  a  basis  function  is  proportional  to  the  same  (for 
Fourier  series)  or  another  basis  function.  The  wavelet  series  for  the  deriva¬ 
tive  of  a  wavelet  basis  function  has  infinitely  many  terms  so,  at  least  at 
first  sight,  wavelet  series  would  not  appear  to  provide  a  practical  method 
for  representing  derivatives.  In  fact,  wavelet  series  turn  out  to  be  quite  com¬ 
patible  with  differentiation  for  theoretical  expansions  but  they  are  superior 
for  numerical  differentiation,  and  lead  to  superior  accuracy  and  stability  in 
the  numerical  solution  of  nonlinear  differential  equations.  In  this  section 
we  show  how  to  calculate  the  wavelet  expansion  of  a  derivative  in  terms  of 
wavelet  series.  We  will  restrict  our  discussion  to  rank  m  =  2  and  simplify 
notation  by  writing 

(p,{x)  :=  (p°(x  -  1),  il>{x)  :=  <p^(x),  :=  2»/V(y  *  -  *)• 
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Figure  4-1 1 :  Formal  derivative  of  order  0.5500 ...  of  the  D2  scaling  function 
(m=2,  g=2). 


Figure  4-12:  Formal  derivative  of  order  2.44998 ...  — 14.53236 ...  of  the  D2 
scaling  function  (m  =  2,jr  =  2). 
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The  wavelet  series  for  /  is 

/(*) = 2  **<^'(*)  52  53 

I  i>0  k 

Differentiating  this  formula  expresses  the  derivative  of  /  in  terms  of  the 
derivatives  of  the  basis  functions  and  V'>t-  Since  the  basis  fimcticxis  i/>jk 
are  defined  in  terms  of  ip,  we  wiU  begin  by  deriving  the  the  coefficients  for 
the  wavelet  expansion  few  dipifdz.  The  F  coefficients  in  the  series 


dip,{x) 

dz 

(4-39) 

j>0  k 

and 

H 

II 

= 12  + 52  52^it’‘ 

(4-40) 

j'>0  k' 

are  called  connection  coefficients.  Our  starting  point 

is  the  collection  of 

formulae 

rf  = 

(4A1) 

Ff*  == 

(4A2) 

a. 

9r 

II 

(4A3) 

ri'.*'  _ 

(4-44) 

The  oasic  assumptions  that  are  needed  to  find  the  numerical  values  of 
the  connection  coefficients  are: 

1.  The  basis  functions  have  compact  support; 

2.  The  integrals  that  appear  exist,  and  integration  by  parts  can  be  ap¬ 
plied  to  them; 

3.  The  basis  function  ^  satisfies  the  scaling  recursiem. 


Theorem.  The  connection  coefficients  Ff  satisfy  the  foUowing  formula: 
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(0  rf  =  -ri 

(••)  rf  =  r*-' 

(4-45) 

(««)  n  = 

These  relations  are  all  homogeneous  equations;  they  determine  the  con¬ 
nection  coefficients  only  up  to  a  common  factor.  An  additional  equation 
that  is  not  homogeneous  is  needed  to  specify  that  factor.  It  appears  to  be 
necessary  to  restrict  the  wavelet  matrix  in  order  to  obtain  an  inhomoge¬ 
neous  equation.  If  we  suppose  that  the  wavelet  matrix  has  genus  g  >  1  and 
that  f  xil){x)dx  =  0,  then  it  can  be  shown  that  the  function  f{x)  =  x  has 
a  (low  pass)  wavelet  series  that  only  involves  translated  scaling  functions. 
A  simple  calculation  shows  that 

X  =  Momi(^)  -  (4-46) 

I 

where  Momi(^)  =  /  x<p(x)dx,  and 

Momj(¥»)  =  |5^i:a2. 

^  k 

Differentiating  this  series  and  substituting  the  connection  coefficient  ex¬ 
pressions  yields 

•=EE  rf^t(x)  +  terms  involving^. 

I  k 

From  this  formula  one  easily  finds 

Theorem.  If  Mom„(V’)  :=  /  x’*i>{x)dx  =  0  for  n  =  0, . . . ,  d,  then 
(i)  =  -d„,i  for  0<n<2n-l-2 

(4-47) 

(“)  =0  for  0<n<n. 

There  are  similar  but  more  complicated  inhomogeneous  relations  for  the 
higher  moments  of  the  connection  coefficients. 

Example.  Suppose  that  9  =  2  and  that  the  wavelet  matrix  is  D2. 
Then 

(•••,rS,...)  =  ^(-1,8.0. -8.1).  (448) 

*  =  -2.-1.0.1.2andi;trJ  =  -l. 


where 
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Chapter  5 

A  Bestiary  of  Wavelets 


The  theory  of  compactly  supported  wavelets  has  produced  unusual  new 
classes  of  orthonormal  basis  functions.  Some  examples  of  these  wavelet 
bases  challenge  conventional  views  about  the  kind  of  mathematical  analysis 
that  is  suitable  for  digital  signal  processing.  Others  illustrate  some  of  the 
fine  points  of  the  theory  of  Lebesgue  integration  and  its  use  in  applications. 
And  some  of  the  new  functions  are  just  interesting  in  themselves. 

This  chapter  collects  examples  of  compactly  supported  scaling  functions 
and  wavelets  that  have  helped  the  author  to  understand  wavelets  and  de¬ 
velop  intuition  about  them. 

The  chr^ter  is  generally  organized  so  that  rank  m  =  2  is  discussed 
first,  beginning  with  genus  y  =  1,  for  orthonormal  bases  of  real  valued 
wavelet  functions.  This  discussion  is  followed  by  examples  of  complex 
valued  wavelets,  biorthogonal  wavelets,  wavelets  of  higher  rank,  including 
m  =  oo,  and  wavelets  of  higher  dimension. 


5.1  Five  Fundamental  Examples 

5.1.1  sine  Wavelets 

Shannon’s  Sampling  Theorem  implies  the  formula 

sine  (tx)  =  ^  sine  (»k/2)  sine  (x(2x  —  k)).  (5-1) 

k 

The  coefficients  sine  {irk (2)  are  the  same  as  the  low  pass  row  of  the  sine 
wavelet  matrix,  eq.(4-17).  Eq.(5-1)  shows  that  the  sine  function  is  a  rank 
m  =  2  scaling  function  of  infinite  genus.  Its  well  known  graph  is  displayed 
in  figure  5-1. 
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Figure  5-1;  The  sine  sc&ling  function  has  non-compact  support. 


5.1.2  Haar  Wavelets 

The  Haar  wavelet  basis  was  introduced  by  Alfred  Haar  in  his  doctoral 
dissertation  in  1910  [19].  The  Hear  scaling  function  is  the  simplest  of  all: 
the  function  is  zero  outside  the  interval  between  x  =  0  and  x  =  1;  if 
0  <  X  <  1,  it  is  equal  to  1;  and  ^(0)  =:  1|^(1)  —  0.  Figure  5-2  illustrates 
the  Haar  scaling  function  over  the  interval  from-1  to  2,  and  figure  5-3  shows 
the  graph  of  the  Haar  wavelet  on  the  same  interval. 

The  Haar  basis  corresponds  to  the  wavelet  matrix 

(l-O- 

The  Haar  scaling  function  is  a  compactly  supported  solution  of  the  scaling 
equation 

<p(x)  =  (p(2z)  +  y>(2x  -  1).  (5-2) 

Other  compactly  supported  solutions  of  eq.(5-2)  differ  only  at  the  endpoints 
of  (0, 1).  Up  to  a  normalizing  factor,  the  general  solution  is 

^(x)  =  1  for  0  <  X  <  1,  ^(0)  arbitrary.  (5-3) 

It  is  not  well  known  that  this  simple  scaling  equation  has  uncountably 
infinitely  many  other  solutions  that  do  not  have  compact  support.  Figure 
5-4  displays  the  graph  of  one  of  them,  and  the  following  figure  zooms  in  on 
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Figure  5-2:  Standard  Faar  scaling  function  with  compact  support. 


Figure  5-3:  Standard  Hear  wavelet  with  compact  support. 
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Figure  5-4:  Hear  scaling  function  with  noncompact  support. 


the  region  of  the  graph  centered  on  the  interval  [0, 1].  This  curious  function 
is  continuous  except  at  the  points  x  =  0  and  x  =  -1;  more  remarkable,  it 
is  piecewise  linear! 

5.1.3  Car  Alarm  Wavelets 

The  CarAlarm  wavelets^  arise  from  wavelet  matrices  but  they  do  not  form 
orthonormal  bases  for  L^{R).  Daubechies  [9]  used  CarAlarmB  (figures  5-6 
and  5-7),  whose  scaling  vector  is  (1,0, 1,0),  to  prove  that  wavelet  matrices 
do  not  always  produce  orthonormal  bases  for  L^(R)-  CarAlarm  scaling 
functions  are  nonnegative:  they  assume  only  the  values  zero  and  one.  In 
the  figures  CarAlarm  wavelets  are  graphed  for  all  dyadic  rational  arguments 
X  =  ib/2^  where  j  <  J,  and  J  is  indicated  in  the  caption  of  the  figure. 

Figure  5-8  illustrates  the  modulus  of  the  Fourier  transform  of  Car- 
Alarm2:  it  is  evidently  proportional  to  the  modulus  of  the  Fourier  transform 
of  the  standard  Faar  scaling  function,  i.e.  to  the  modulus  of  the  sine  func¬ 
tion,  whose  gr^h  is  shown  in  the  same  figure  as  the  thick  gray  curve.  This 
suggests  Daubechies’  argument,  that  CarAlarmS  is  just  a  stretched  version 
of  the  Haar  scaling  function  supported  on  [0,3],  which  is  too  wide  to  be 
orthogonal  to  its  integer  translates.  This  argument  would  be  valid  were  the 
sine  function  integrable,  for  then  its  inverse  Fourier  transform  would  be 

‘Named  by  David  Phimmer,  who  diaoovered  that  is  what  these  wavekti  sound  lihe 
when  played  through  the  computer's  loudspeaker. 


function  with  noncompact  support  (Zoomed). 


:  CarAlarmS  scaling  function;  J  =  A. 
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Figure  5-7:  CarAlarm2  wavelet;  J  —  A. 


equal  to  the  CaM/arm^  function.  But  sincx  dies  off  too  slowly  to  belong 
to  so  a  more  subtle  analysis  is  required. 

If  one  considers  the  restriction  of  CarAlarmS  the  dyadic  rational  num¬ 
bers,  where  its  values  are  given  by  the  scaling  recursion,  then  it  can  be 
shown  that  this  discrete  CarAlarmS  function  is  orthogonal  to  its  integer 
translates  with  respect  to  a  Riemann  sum-like  inner  product.  This  interest¬ 
ing  observation  suggests  employing  a  theory  of  integration  weaker  than  the 
Lebesgue  theory  and  similar  to  the  Riemann  sum  approximations  employed 
in  Riemann  integration  in  order  to  retain  orthogonality  for  the  CarAlarm 
wavelets. 

The  2g  eigenvalues  of  j4(0)  for  CarAlarmg  are  roots  of  unity  whose  order 
divides  2(^g  —  1).  Sometimes  this  order  equals  2{g  —  1),  as  shown  figure  5-9 
for  9  =  15.  The  eigenvalue  1  always  occurs  with  multiplicity  greater  than 
one.  CarAlarmS  is  a  rank  m  =  2,  genus  g  =  3  wavelet  matrix  whose  scaling 
vector  is  (0,0, 1,0,0, 1).  The  corresponding  scaling  function  is  displayed  in 
figure  5-10. 

5.1.4  NonNegative  Wavelets 

The  CarAlarm  wavelets  are  non  negative  but  their  values  are  systemat¬ 
ically  zero  on  a  lattice  of  dyadic  rational  numbers.  Since  these  func¬ 
tions  are  clearly  out  of  the  ordinary,  it  not  surprising  to  learn  that  Car- 
Alarm  Vi  not  integrable,  and  that  it  does  not  produce  an  orthonOTmal  basis 
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Figure  5-8:  Modulus  of  the  Fourier  transform  of  the  CarAlarmB  scaling 
function,  in  dB. 
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Figure  5-10:  CarA/arm5  scaling  function;  J  —  A. 


Figure  5-11:  CarAlarmS  wavelet;  J  =  A. 
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Figure  5-12:  Modulus  of  the  Fourier  transform  of  the  CarAlarmS  scaling 
function,  in  dB  (m  =  2,9  =  3). 


in  the  sense  of  i*(R)  orthogonality.  The  class  of  wavelets  called  Non- 
Negative  arise  from  wavelet  matrices  that  lie  near  the  CarAlarm  wavelet 
matrices  in  the  metric  of  the  wavelet  matrix  parameter  space.  There 
are  infinitely  many  wavelet  matrices  of  this  class.  A  NonNegative  scal¬ 
ing  function  appears  to  be  non  negative,  i.e.  its  graph  appears  to  lie 
on  or  above  the  z-axis.  Its  graph  can  appear  to  be  relatively  smooth  as 
well,  although  this  is  misleading.  What  is  more  important  is  that  these 
wavelet  matrices  satisfy  Lawton’s  criterion  for  producing  an  L’(R)  or¬ 
thonormal  basis.  The  orthogonality  of  the  overlapped  scaling  functions 
and  their  apparent  positivity  seem  to  be  contradictory.  For  the  Non- 
Negative  wavelet  matrix  near  CarAlarmS  whose  scaling  vector  is  a”  = 
(0.08002,9.3954  x  10-®,0.92123, -0.06794, -0.00125, 1.06785),  the  corre¬ 
sponding  scaling  function  is  shown  in  figure  5-13;  it  certainly  appears  to 
be  "non  negative.”  But,  if  z  =  k/V  and  j  >  15,  then  ^(z)  assumes  nega¬ 
tive  values.  These  negative  values  at  fine  scales  provide  the  wthogonality 
required  by  theory,  but  they  are  not  accessible  to  measurement. 

The  figure  is  also  misleading  because  consecutive  points  are  connected. 
If  the  figure  displays  only  the  computed  p<nnts,  the  result  looks  like  figure 
5-14: 

The  fdlowing  two  figures  show  the  modulus  and  argument  of  the  Fourier 
transform  for  the  scaling  function  NonNegative  . 

A  scaling  function  that  is  much  closer  to  CarAlarmS  and  shows  how 
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Figure  5-13;  The  scaling  functbn  NonNegaiive  {m  =  2,g  =  2\J  =  6). 


Figure  5-14:  The  same  NonNegaiive  scaling  function:  only  computed  points 
are  shown,  (m  =  2,^  =  2;  7  =  6). 


3 


104 


An  Essay  on  Wavelet  Technology 


Aware,  &c.  AD921018 


Figure  5-17:  Argument  of  the  Fourier  transform  of  the  scaling  function 
NonNtgaiive. 


NonNegaiive  functions  arise  is  the  scaling  function  NonNegative(a).  Its 
scaling  vector  is,  correct  to  order  10“**, 

a°  =  (1.5x  10“*®,0,l-1.5xl0“**,-1.5x  10“**,10-*®,H-1.5x  10'**). 

The  gri^h  of  the  scaling  function  and  wavelet  are  shown  in  figures  5-18 
and  5-19. 

The  following  two  figures  show  the  modulus  and  argument  of  the  Fourier 
transform  for  the  scaling  function  NonNegaUve(a). 

5.1.5  Daubechies  Wavelets 

The  Daubechies  wavelets  lie  at  another  extreme  in  the  space  of  all  wavelets, 
for  these  wavelets  are  as  smooth  as  a  wavelet  can  be,  and  they  provide  high 
order  polynomial  approximations.  This  is  why  they  have  been  so  often  used 
for  applications  where  appraximati(»  and  interpolation  are  important. 

The  Fourier  transform  of  the  17aar  scaling  function  differs  from  sine  rx  := 
sin  irx/xx  by  a  phase  factor,  so  the  modulus  of  sine  is  the  modulus  of  the 
Fourier  transform  of  the  Bear  scaling  function,  which  is  denoted  Dl  to 
remind  us  of  its  relationship  to  the  Daubechies  wavelets. 
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Figure  5-18:  The  scaling  function  NonNegative(a);  only  computed  points 
are  shown,  (m  =  2, 9  =  2;  J  =  6). 


Figure  5-19:  The  NonNegaiive(a)  wavelet  (m  =  2,y  =  2;  J  =  6). 


Figure  5-21:  Argument  of  the  Fourier  transform  of  the  scaling  function 
NonNegaUve(a). 
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Figure  &-22:  Modulus  of  the  Fourier  transftvm  of  the  Dl  sc&ling  function, 
in  dB.  i.e.  1 8inc(Tx)|. 
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Figure  5-24:  Argument  of  the  Fourier  transfcwm  of  the  D2  scaling  function. 
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Figure  5-26:  Daubechies  scaling  function  D5  (m  =  2,$  =  5). 


Figures  5-29  and  5-30  display  the  scaling  function  and  wavelet  for  the 
Daubtchits  wavelet  matrix  of  genus  g  =  15.  The  basis  formed  from  these 
functions  can  perfectly  interpolate  polynomials  up  to  degree  14.  While  the 
scaling  function  is  supported  on  an  interval  of  length  29,  figure  5-29  shows 
that  the  effective  support,  i.e.  the  length  of  the  interval  where  the  function 
is  significantly  different  from  zero,  is  only  about  10  units  long. 


5.2  Real  Orthogonal  Wavelets 

5.2.1  A  wavelet  basis  of  genus  g^2. 

There  are  only  two  essentially  distinct  rank  m  s  2  wavelet  matrices  of  genus 
9  =  2  that  lead  to  formally  differentiable  wavelets:  one  is  the  Daubechies 
wavelet  matrix  D2;  the  other  is  a  wavelet  matrix  called  Ri2.  The  Ri2 
wavelet  matrix  is 

_1  f  1  l  +  y/2  1  l--v/2 

“"2V1-V2  -1  1-l-v^  -1  ) 
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Figure  5-33:  The  scaling  function  Bat  (m  =  2,g  ~  3). 


5.2.2  Hat  Wavelets 

Scaling  functions  derived  from  a  wavelet  matrix  can  never  be  symmetric, 
but  they  can  be  very  nearly  sysmmetric.  The  wavelet  system  Hat  is  an 
interesting  example  for  genus  9  =  3;  the  scaling  function  and  wavelet  are 
illustrated  in  figure  5-33.  The  scaling  vector  for  Hat  is 

a°  =  (-0.0943,0.4966,1.2066,0.5246,-0.1123,-0.0213). 

5.3  Flat  Wavelets 

If  the  entries  of  the  wavelet  matrix  a  are,  up  to  a  common  factor,  equal  to 
±1  we  say  that  a  is  a  flat  real  wavelet  matrix;  if  the  entries  of  a  are,  up 
to  a  common  factor,  complex  numbers  of  modulus  1  we  say  that  o  is  a  flat 
complex  wavelet  matrix. 

The  flat  real  matrices  of  genus  9  =  1  ve  Hadamard  matrices,  and 
the  flat  complex  matrices  of  genus  9  =  1  are  (up  to  a  normalizing  factor) 
modifications  of  the  Finite  Fourier  IVansform  matrix. 
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Figure  5-34:  The  wavelet  Sat  (ir.  =  2,g  =  3). 


Figure  5-35:  Modulus  of  the  Fourier  transform  of  the  scaling  function  ip  for 
Eat  (m  =  2,5  =  3). 
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Figure  5-36:  The  scaliug  function  for  a  flat  wavelet  (m  =  2,9  s  4). 


Figure  5-37:  The  modulus  of  the  Fourier  transform  of  the  previous  scaling 
function,  in  dB  (m  =  2,g  s  4). 
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Figure  5-38:  The  argument  of  the  Fourier  transform  dl  the  previous  scaling 
fimction,  in  dB  (m  =  2,g  =  4). 
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Figure  5-40;  The  scaling  function  for  a  flat  wavelet  (m  =  2,y  =  16). 


5.4  Complex  Orthogonal  Wavelets 

5.4.1  Lawton-Resnikoff'Wave\et8 

The  real  and  imaginary  parts  of  compactly  supported  complex  wavelets  can 
be  symmetric,  in  contrast  to  real  compactly  supported  wavelets.  The  rank 
m  =  2,  genus  g  =  Z  complex  wavelet  matrix  given  in  eq.  (4-26)  produces  the 
syimnetric  complex  scaling  functions  and  skew  symmetric  wavelet  displayed 
in  figures  5-45  -  5-49. 
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Figure  &-41:  The  modulus  of  the  Fourier  transform  of  the  previous  scaling 
function,  in  dB  (m  =  2,^  =  16). 
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Figure  5-43:  The  wavelet  for  the  same  flat  wavelet  matrix  (m  =:  2,y  =  16). 


Figure  6-44:  Geometry  of  the  eigenvalues  for  a  real  flat  wavelet  matrix  cS 
rank  m  =  2  and  genus  g  =  16. 


•  •• 


•  • 
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5-47:  The  real  part  of  the  Lawton-Resnikoff  complex  wavelet,  m  = 
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Figure  5-48:  The  imaginary  part  of  the  Lawton-Resnikoff  complex  wavelet, 
m  =  2,g  =  3. 


Figure  5-49:  Modulus  of  the  Fourier  transform  of  the  Lawton~Resnikoff 
scaling  function,  m  =  2,y  =  3. 
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Figure  5-50:  Cohen- Daubeckiea-Fcauveau  left  biorthogonal  scaling  function 
(m  =  2,s  =  4). 


5.5  Biorthogonal  Wavelets 

Biorthogonal  wavelets  are  a  generalization  of  the  wavelet  idea  obtained  by 
weakening  the  assumptions  that  constrain  the  wavelet  matrix.  In  general, 
these  ‘Veakened  wavelets”  have  little  to  offer  that  is  really  new.  But  there 
is  one  case  where  biorthogonal  wavelet  matrices  and  biorthogonal  wavelet 
functions  do  add  something:  biorthogoral  wavelets  can  be  symmetrical.  An 
important  example  was  discovered  by  Heller  [23],  whose  complex  biorthog¬ 
onal  wavelets  are  illustrated  in  figures  5-56  -5-60.  The  wavelet  matrix  was 
given  in  eq(4-28). 

5.5.1  Cohen-Daxibechies-Feauveau  Wavelets 


The  following  real  biorthogonal  wavelet  matrix  pair  of  genus  ;  =  8  is 
particularly  useful  for  image  compression  when  the  amount  of  computation 
is  not  a  major  constraint. 
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Figure  5-55:  ^  ^  for  a  biorthogonal  wavelet  matrix  pair  of  genus  g  =  6 
(m  =  2,5  =  8). 


The  absolute  value  of  the  difference  ^  ^  is  less  than  0.02. 

5.5.2  Heller  Wavelets 

Heller’s  wavelets  are  complex  biorthogonal  functions  with  useful  symme¬ 
tries.  The  simplest  example  is  for  9  =  3.  The  wavelet  matrix  given  in  eq. 
(4-28)  produces  the  scaling  function  smd  wavelets  shown  in  figures  5-56  - 
5-60. 
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Figure  5-56:  Real  part  of  the  6-coefBcient  Eeller  bicwthogonal  scaling  func¬ 
tion  (fi, 


Figure  5-57:  Imaginary  part  of  the  6-coefficient  Heller  biorthogonal  scaling 
function  y>L 
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Figure  5-58;  Modulus  of  the  Fourier  transform  of  the  BtUer  scaling  func¬ 
tion,  m  =  2, 9  =  3. 


5.6  Negative  Multiplier  Dauheckies  Wavelets 

The  equations  that  define  a  wavelet  matrix  can  be  extended  to  negative 
muhiphers.  Let  m  be  an  integer  and  suppose  that  |m|  >  1.  The  extended 
equations  defining  a  wavelet  matrix  are 

SaI+mf'«*+mJ  =  (5-4) 

k 

The  linear  constraint  is 

=  (5-5) 

k 

m  is  called  the  multiplier,  the  rank  is  |m|.  It  is  easily  seen  that  the  solutions 
of  these  equations  for  negative  multiplier  m  coincide  with  the  solutions  for 
the  positive  multiplier  |m|.  Although  the  wavelet  matrices  are  the  same  for 
positive  and  negative  multipliers  of  the  same  rank,  the  scaling  functions  and 
wavelets  are  different.  Recall  that  the  defining  equations  for  these  functions 
are 

k 

Figure  ??  displays  the  scaling  function  for  multiplier  m  =  — 2  with  wavelet 
matrix  D2.  Comparison  with  figure  4-1  shows  that  the  negative  multiplier 
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Figure  5-59:  Real  part  of  the  6-coefficient  HtUtr  biwthogonal  scaling  func¬ 
tion 


Figure  5-60:  Imaginary  part  of  the  6-coefficient  17c//er  biorthogonal  scaling 
function 
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Figure  5-61:  Daubechies  wavelet  with  negative  multiplier  (m  =  —2,g  =  2). 


scaling  function  is  more  symmetrical  and  appears  to  the  eye  to  be  smoother 
than  Daubechies’  function,  although  it  shares  the  property  that  polynomials 
of  degree  1  can  be  represented  as  a  low  pass  sum  of  linear  combinations  of 
the  negative  multiplier  scaling  function  and  its  integer  translates. 

The  Fourier  transforms  of  the  two  scaling  functions  differ  only  in  phase. 


5.7  Higher  Rank  Wavelets 

5.7.1  Pollen- Plummer  Higher  Rank  Wavelets 

ThePo//en-P/ummer  higher  rank  scaling  functions  are  generalizations  of  the 
rank  2  scaling  functions  discovered  by  Daubechies.  They  satisfy  the  scaling 
recursion 

m0-l 

V’(*)  =  S  a°ip{mz  -  k) 
k=0 

and  all  moments  of  order  less  than  g  of  each  fundamental  wavelet 

mf-l 

:=  ]C  -  *) 

t=o 


(5-6) 


132  Ail  Essay  on  Wavelet  Tecbnohgy  Aware,  Inc.  AD921018 

Figure  5-62:  Formal  derivative  of  Dauhtchies  wavelet  with  negative  multi¬ 
plier  (m  =  -2,  j  =  2). 


vanishes  for  0  <  r  <  m.  For  genera  g  —  2  and  p  =  3  Heller  has  found 
all  possible  ways  of  completing  the  first  row  of  the  corresponding  rank  m 
wavelet  matrix  to  a  full  rank  m  wavelet  matrix. 

The  Pollen- Plvmmer  scaling  functions  for  genus  g  =  2  and  multipliers 
m  =  3, 5  and  oo  are  illustrated  in  figures  5-66  and  5-67. 


As  m  — »  oo  these  scaling  fiinctions  tend  to  a  limit  which  coincides 
with  the  limit  of  the  first  row  of  the  wavelet  matrix.  In  terms  of  the 
variable  z  =  k/m,  and  letting  k  and  m  approach  oo  so  that  their  ratio 
tends  to  a  finite  limit  yields  a  function  ^oo(>)  of  the  continuous  variable 
z  (displayed  in  figure  5-68  )  that  is  supported  on  [0,2)  and  generates  a 
complete  orthonormal  system  for  Ltwo(R)  in  the  usual  way.  In  particular, 
the  set  {^oo(z  —  /)  :  I  €  Z)  of  translations  of  the  scaling  functions  are 
orthonormal,  and  every  polynomial  of  degree  1  can  be  expressed  as  a  linear 
combination  of  them. 
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Figure  5-64:  Dauhechies  acaling  fiuction  with  negative  multiplier  (m  = 
-2,9  =  3). 


Figure  5-65;  Dauhechies  scaling  function  with  negative  multiplier  (m  = 
-2,9  =  3). 
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5.8  Higher  Dimensional  Wavelets 


5.8.1  Irreducible  Two-Dimensional  Wavelets 


A  is  a  lattice  in  the  complex  plane  and  /i  €  A  is  a  complex  number  such 
that 


MCA 


A  2-dimensional  wavelet  matrix  is  a  ^/i  x  g/ifi  matrix  a  of  complex  numbers 


that  satisfies  the  wavelet  scaling  conditions 
EacaOa  = 

Sa6A^+»*«''®a+»*»'  ~  fills' 


(5-7) 
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Matrix  Formulation 
The  mapping 

is  a  2-dunensional  real  representation  of  the  field  of  cmnplex  numbers  C. 
It  is  easily  verified  that 

|z|*  =  zr  =  detT(z) 


and 

z  +  z  =  tr  x(z). 

The  scaling  equation  can  be  written  in  terms  of  addition  and  multiplication 
in  the  matrix  ring. 

Consider  multipliers  ft  €  C  such  that  2tr  x(/i)  and  m  :=  detx(/i)  are 
integers.  These  multipliers  can  be  classified  according  the  values  of  the 
determinant  and  trace.  The  case  where  m  =  1  is  not  interesting.  For 
m  =  2  there  are  three  possibilities. 

The  Three  Cases  for  m  —  2 
If  tr  ir(ft)  =  0  then 


are  the  only  possibilities. 
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Figure  5-69:  One-dimeiuional  to  two-dimensional  mapping  for  the  Rectan¬ 
gle  ^asr  scaling  function  (p{z)  (m  =  2,g  =  1). 


Figure  5-70:  Novon  Saar  scaling  function  y>(z)  (m  =  2,  j  =  1). 


•-  •• 
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Figure  5-71:  One-dimensional  to  two-dinrtensional  mapping  for  the  ffovon 
Baar  scaling  function  ip{z)  (m  =  2,g  =  1). 


Figure  5-72:  Twindragon  Baar  scaling  function  (p{z)  (m  =  2,ff  =  1). 
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Figure  5-73:  Two  dimensional  row  and  column  transformed  image  Lenna 
image  employing  the  D3  product  basis. 


An  Eaiay  on  Wavelet  Technology  Aware,  Inc.  AD921018 


Part  III 


Applications  of  Wavelet 
Technology  to  Bandwidth 
Management 


Chapter  6 


Bandwidth  Compression 


6.1  Image  Compression 

6.1.1  Imagery 

The  human  eye  and  the  ear  capture  an  amazing  amount  of  data  every  sec¬ 
ond  -  much  more  than  the  brain  can  consider.  For  instance,  in  normal 
daylight  vision,  the  color  sensitive  receptors  of  the  retina  in  each  eye  collect 
about  800  megabits  of  visual  information  each  second.  The  first  step  in 
making  what  the  eye  sees  available  to  the  brain  is  the  reduction  of  this 
enormous  amount  of  data  by  selective  omission  of  less  important  informa¬ 
tion.  The  neural  network  that  connects  the  eye  to  the  visual  cortex  does 
this  in  real  time,  continuously  and  automatically.  This  automatic  culling 
or  “compression”  reduces  the  amount  of  information  that  reaches  the  brain 
by  a  factor  of  at  least  100.  This  fact  shows  that  roost  of  the  physical  infor¬ 
mation  represented  in  the  light  signal  that  falls  on  the  eye  is  not  needed  by 
the  brain  -  this  basic  fact  about  vision  is  the  psychophysical  cornerstone  of 
practical  compression  algorithms  for  video  and  other  natural  images. 

6.1.2  Measures  of  Image  Quality 

The  decompressed  image  produced  by  a  lossy  compression  process  is  not  the 
same  as  the  original.  The  problem  of  defining  image  quality  is  to  provide 
a  method  for  comparing  different  image  compression  methods.  Although 
this  question  is  a  natural  one,  a  satisfactory  answer  still  eludes  the  research 
community. 

For  many  years  engineers  adopted  “mean  squared  error”  as  a  measure  of 
quality.  For  a  decompressed  gray  scale  image,  the  mean  squared  err«  is  the 
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Figure  6-1;  Simultaneous  contrast  enhances  edges. 


sum  of  the  squares  of  the  differences  between  the  corresponding  pixel  values 
of  the  source  image  and  the  decompressed  image.  MathematicaUy,  this  is 
the  discrete  I'^(R)  norm  of  the  difference  of  the  two  images;  physically, 
the  error  measures  the  energy  in  the  difference  image.  The  mean 

squared  error  per  pixel  is  often  used  to  compare  the  quality  of  images  of 
different  size. 

Misplaced  image  energy  is  certidnly  a  quality  measure:  if  the  misplaced 
energy  is  large,  the  quality  cannot  be  high.  But  a  small  error  need 

not  imply  that  the  two  images  are  subjectively  interchangeable.  There  are 
several  reasons  for  this.  Nonlinearities  in  the  human  visual  system  result 
in  a  nonuniform  weighting  of  different  types  of  visual  phenomena  which  is 
not  ciq)tured  by  the  I>^(R)  norm.  Moreover,  if  the  energy  in  the  difference 
image  is  localized,  it  may  ^>pear  as  an  artifact  but  if  the  error  energy 
is  more  or  less  randomly  distributed  throught  the  image,  it  may  not  be 
perceived  at  all.  Image  compression  experts  know  that  mean  squared  error 
is  not  proportional  to  subjective  image  quality,  but  it  is  frequently  used  as 
a  substitute  nevertheless. 

The  maximum  error,  i.e.  the  largest  of  the  absolute  values  of  the  pixel 
differences,  is  a  measure  that  may  correspond  better  to  subjective  quality 
assessments. 

For  a  fixed  quality  level,  the  amount  of  compression  that  is  inherent 
in  an  image  depends  on  the  size  of  the  image  and  the  rate  at  which  it  is 
sampled. 

Images  sampled  at  higher  resolution  have  greater  redundancy  because 
uniform  regions  of  the  image  account  for  a  greater  number  of  samples.  For 
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Figure  6-2:  IVade-off  between  compression  ratio  and  image  size  for  three 
fixed  levels  of  quality  for  Aware’s  compression. 


ffiMiiiiiii  muo 


a.  !•  4.  14  4.  It  •.  It  1.  10  t.a  to  1.4  to 


a  fixed  level  of  absolute  resolution,  natural  images  and  images  of  a  larger 
scene  typically  have  greater  redundancy  than  images  of  a  smaller  scene 
because  of  the  local  uniformity  of  physical  objects. 

Let  AT  denote  the  number  of  pixels  in  a  rectangular  image.  We  wiU 
normalize  storage  requirements  so  that  the  amount  of  computer  memeory 
required  to  store  one  pixel  of  image  information  is  1  storage  unit.  Let  C(A^) 
be  the  number  of  storage  units  required  to  store  a  compressed  version  of 
the  image.  Then  N/C{N)  is  the  compression  ratio.* 

Figure  6-2  displays  three  graphs  that  show  how  C(N)  varies  with  N  for  a 
fixed  image  sampled  at  increasing  resolution  and  compressed  by  a  wavelet 
image  compression  algorithm.  The  graphs  were  constructed  by  starting 
with  a  digital  image  sampled  at  a  high  resolution.  To  obtain  subsampled 
lower  resolution  versions  of  the  image,  it  was  partitioned  into  2x2  blocks 
of  adjacent  pixels  which  were  averaged.  The  average  value  was  assigned 
to  each  of  the  pixels.  This  procedure  was  repeated  a  number  of  times  to 
reduce  the  sampling  resolution. 

Each  curve  in  the  figure  corresponds  to  a  different  subjective  quality 
measure.  The  lowest  curve  corresponds  to  compressed  image  whose  de¬ 
compressed  form  is  visually  identical  to  the  ori^al;  the  next  higher  curve 
corresponds  to  decompressed  images  that  are  visually  different  from  the 
original  but  do  not  produce  ’Hinacceptable”  artifacts.  The  highest  curve 

*  Here  we  are  amittiiig  the  overhead  atoragc  required  to  deccnaprcaa  the  image.  Since 
thia  is  -  ior  wavelet  compression  -  a  small  fractkm  of  the  total,  we  shall  ignore  it. 
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corresponds  to  decompressed  images  that  appear  natural  when  viewed  alone 
but  have  obvious  natural  looking  artifacts  when  ccHnpared  with  the  original. 

The  first  and  third  curves  appear  to  be  asymptotically  linear,  and  the 
second  appears  to  lie  between  the  other  two.  If  we  assume  that  the  curves 
are  asymptotically  linear,  this  implies  the  relation 

N/CiN)  -  (a  +  N)/p 


for  constants  a  and  at  least  for  N  very  large,  so 

As  N  —*  oo,  we  find 

Jim  C{N)^p. 

iV— *00 


(6-1) 


This  formula  states  that  if  is  sufficiently  large,  then  there  is  no  further 
information  gained  by  sampling  the  image  at  increasingly  fine  scales. 


6.2  Transform  Image  Compression  Systems 

6.2.1  System  Block  Diagram 

The  block  diagram  of  a  transform  image  compression  system  is  shown  in 
figure  6-3. 

A  transform  compression  encoder  consists  of  three  subsystems:  a  trans¬ 
form  subsystem;  a  gsanftzafton  subsystem,  and  an  entropy  coder  subsystem. 
As  shown  in  the  figure,  the  input  signal  progressively  passes  through  these 
subsystems. 

The  IVansform 

The  transform  is  invertible;  no  information  is  lost  in  this  stage  of  the  com¬ 
pression  process.  The  transform  merely  reorganizes  the  input  image  data  in 
a  way  that  makes  the  quantization  stage  more  effective.  Since  the  transform 
is  invertible,  it  is  not  directly  responsible  for  any  compression:  for  a  gray 
scale  image,  for  instance,  the  transformed  image  provides  one  transform 
coefficient  for  each  input  pixel  value.’ 

The  purpose  of  the  transform  is  to  reorganize  the  image  data  to  con¬ 
centrate  large  scale/low  spatial  frequency  image  signal  energy  into  a  small 
number  of  coefficients  for  subsequent  quantization.  Selection  of  a  transform 
is  governed  by  the  following  criteria: 

^The  trandonn  may  diange  the  dynamic  range  of  the  input  value*,  and  thereby 
require  more  or  lea*  *torage  for  a  transfonn  coefficient.  We  ahall  ignore  thi*  factor  in  our 
diacu**ion. 
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Figure  6-3:  Diagram  of  a  transform  image  compression  system. 


Transform  Image  Compression 
— - —  System  Diagram _ 


tawwnk  liiBiaii 
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•  Transforms  that  concentrate  large  scale/low  spatial  frequency  energy 
better  are  preferred. 

•  Transforms  that  are  computationally  efficient  are  preferred. 

•  Transforms  that  retain  localized  image  information  (e.g.  edges,  spec¬ 
ular  reflection)  in  the  low  frequency  transform  coefficients  are  pre¬ 
ferred. 

The  Quantizer 

The  quantizer  allocates  different  numbers  of  bits  for  the  representation  of 
different  transform  coefficients.  The  allocation  of  zero  bits  is  the  same 
as  omitting  the  transform  coefficient.  Typically,  the  transform  coefficients 
that  correspond  to  the  highest  spatial  frequencies  will  be  omitted.  Since 
noise  populates  all  spatial  frequency  states,  and  there  are  more  high  spa¬ 
tial  frequency  states  than  low,  it  follows  that  omission  of  the  high  spatial 
frequency  transform  coefficients  preferentially  eliminates  noise  and  hence 
may  actually  enhance  the  ^pearance  of  the  image.  A  more  refined  ap¬ 
proach  to  quantization  enables  some  high  spatial  frequency  information  to 
be  retained  at  the  expense  of  low  spatial  frequency  information. 

The  quantizer  provides  all  of  the  lossy  compression  -  quantization  is  not 
invertible.  The  selection  of  a  quantizer  is  governed  by  the  following  criteria; 

•  Quantizers  that  provide  a  good  model  of  the  source  imagery  are  pre¬ 
ferred. 

•  Quantizers  that  incorporate  a  good  human  psycbovisual  model  are 
preferred. 

•  Quantizers  whose  quantization  rules  are  computationally  efficient  are 
preferred. 

The  Entropy  Coder 

The  final  step  in  the  compression  process  is  removal  of  redundancy  from  the 
stream  of  quantized  transform  coefficients  by  means  of  an  entropy  coder. 
Entropy  coding  is  invertible:  no  information  is  lost  in  this  stage  of  the 
compression  process.  Neverthelesss,  when  entrc^y  coding  is  ^plied  to  a 
suitably  transformed  and  quuitized  image,  it  provides  a  significant  com¬ 
pression  increment  -  there  can  be  as  much  as  a  30%  reduction  in  the  size 
of  the  compressed  file.  Entropy  coders  are  not  all  equal.  The  selection  of  a 
quantizer  is  governed  by  the  foUowing  criteria: 
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Figure  6-4:  Discrete  cosine  transform  ("DOT”)  matrix  for  JPEG:  m  = 
8,g  =  l. 


•  Entropy  coders  that  limit  the  number  of  pixels  that  can  be  affected 
by  communications  channel  errors  are  preferred. 

•  Entropy  coders  that  have  smaller  memory  and/or  processing  require¬ 
ments  are  preferred. 

6.2.2  Conventional  Block  IVansform  Image  Compres¬ 
sion 

Conventional  block  transforms  have  been  used  for  image  compression  for 
several  decades.  Today,  the  best  known,  if  not  most  effective,  block  trans¬ 
form  is  the  discrete  cosine  transform  (“DCT”)  [47].  The  ISO  JPEG®  still 
image  compression  standard  employs  a  rank  m  =  8  DCT  block  transform. 
The  graph  of  this  6x8  matrix  is  shown  in  figure  6-4  where  the  matrix  is 
depicted  as  a  surface  by  interpolating  the  matrix  entries.  The  top  rows 
is  the  (constant)  low  pass  scaling  vector.  The  other  seven  rows  are  high 
pass.  The  matrix  elements  are  the  top  values  of  the  DCT  digital  filter.  In 
section  6.3.1  DCT  compression  is  compared  with  compression  employing 
rank  m  =  2  wavelets  matrices  of  higher  genus. 

Experiments  to  compress  images  using  the  Hadamard  transform  of  rank 
m  =  64  yielded  poor  results  (cp.  [42]),  as  one  would  expect  from  examining 
figure  6-5  which,  although  it  illustrates  the  rank  m  =  8  Hadamard  wavelet 
matrix,  already  begins  to  show  the  roughness  in  the  transform  matrix  that 


^The  Initmtiional  Siandtrit  Organixation  Joiai  Piciart  EgftrU  Groap  is  the  body 
charged  with  catabUehing  itandards  for  ttill  image  tranmitaioD  and  oomprcMion  fmmaU. 
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Figure  6-5:  Hadamard  transform  matrix:  m  =  8,g  =  1. 


leads  to  characteristic  artifacts  and  low  compression  ratios. 


6.3  Wavelet  Image  Compression 

Today,  the  most  successful  general  image  compression  method  is  based  on 
the  two  dimensional  product  basis  constructed  from  the  Daubechies  wavelet 
matrix  D3.  Recall  that  DZ  provides  a  low  pass  representation  of  polyno¬ 
mials  of  degree  less  than  3.  From  this  it  follows  that  the  product  basis 
provides  a  low  pass  expansion  of  polynomial  functions  of  two  variables,  say 
X  and  y,  each  of  degree  less  than  3.  These  surfaces  are  good  models  for  light 
reflected  from  smooth  surfaces  such  as  walls,  or  the  cheeks  and  forehead  of 
a  face.  The  ability  to  represent  reflected  light  energy  that  produces  surfaces 
in  a  digital  image  means  that  this  trasnform  is  effective  in  concentrating 
a  large  fraction  of  the  image  energy  in  a  small  number  of  low  pass  C^ear 
dc”)  transform  coefScients. 

Figures  6-6  and  6-7  show  the  scaling  function  and  fundamental  wavelet 
for  DZ. 
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Figure  6-8:  Stereogram  of  the  product  of  scaling  functions  >p{z)(p{y)  for 
DZ.  (The  origin  of  coordinates  is  at  the  lower  right.) 


Figures  6-10  tnrough  6-13  display  the  product  basis  functions 

v(*)V’(y),  V>(*)vp(y).  V’(*)ti’(y), 

each  of  which  corresponds  to  a  two  dimensional  digital  filter.  The  filters  that 
correspond  to  these  basis  functions  transform  the  original  image  into  four 
components:  the  low-low,  low-high,  high-low,  and  high-high  parts  of  the 
transform.  Each  of  these  outputs  contains  one  quarter  as  many  coefficients 
as  the  input  image,  so  the  total  number  of  output  coefficients  is  the  same  as 
the  number  of  input  coefficients.  This  fact  makes  it  possible  to  conveniently 
display  the  transform  coefficient  output  as  an  image  also  by  scaling  the 
coefficients  to  fall  in  the  range  of  the  input  pixel  values  -  say  the  range  C 
to  255  for  an  8  bit  grayscale  image. 

The  Lenna  figures  2  and  5-73  were  constructed  this  way.  They  iUustrate 
the  first  stage  in  applying  the  wavelet  transform  for  compression. 

Figures  6-8  and  6-9  are  stereograms  of  the  graph  of  the  “Lo-Lo”  function 
^(*)v’(y)>  the  “Hi-Hi”  function  V’(*)V’(y).  respectively.  The  reader 
should  focus  at  infinity  to  view  these  stereograms. 
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Figure  6-9:  Stereogram  of  the  product  of  wavelet  functions  V*(z)^(y)  for 
D3.  (The  origin  of  coordinates  is  at  the  lower  right.) 
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Figure  6-10:  Two-dimensional  low-low  product  wavelet  basis  function  built 
from  D3:  (p{x)(p{y). 


Bandwidth  Compression 


157 


Figure  6-11:  Two-dimensional  low-high  pass  product  wavelet  basis  function 
built  from  D3:  (p(x)i/)(y). 
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Figure  6-12:  Two-dimensional  high-low  pass  product  wavelet  basis  function 
built  from  DZ:  tlf{x)<p(y). 
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Figure  6-13:  Two-dimensional  high-high  pass 
tion  built  from  D3:  V’(*)V’(l/)- 
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6.3.1  Comparison  of  Conventional  and  Wavelet  Image 
Compression 

In  this  section  we  compare  images  compressed  by  conventional  block  trans¬ 
forms  and  their  wavelet-unified  higher  genus  versions,  for  various  choices  of 
the  rank,  genus,  and  characteristic  Haar  wavelet  matrix. 

The  conventional  block  transform  filters  for  image  compression  described 
in  section  6.2.2  for  the  Discrete  Cosine  (DCT)  and  Hadamard  transforms 
are  the  characteristic  Haar  wavelet  matrices  illustrated  in  this  section.  We 
consider  ranks  m  =  2,3,4  and  8,  and  genera  p  =  1  (the  block  transform), 
2,  3,  and  4. 

In  order  to  make  even  small  differences  perceptible,  the  source  image, 
shown  in  figure  6-14,  is  a  64  x  64  pixel  square  centered  on  the  right  eye 
of  the  Lenna  image.  There  are  8  bits  of  grayscale  information  per  pixel. 
Each  pixel  in  the  original  image  has  been  magnified  (without  interpolation 
or  smoothing)  so  that  it  is  an  8  x  8  square  that  occupies  64  pixels. 

The  source  image  was  compressed  at  a  compression  ratio  of  32-to-l 
using  the  two  dimensional  tensor  product  wavelet  basis  constructed  from 
the  designated  l-dimensional  wavelet  matrix.  The  grayscale  pixel  values  - 
integers  from  0  to  255  -  were  interpreted  as  values  of  a  function  on  a  two 
dimensional  square  lattice.  Four  levels  of  a  Mallat  tree  decomposition  were 
used  to  represent  the  grayscale  image  function  as  a  wavelet  matrix  series. 
The  resulting  collection  of  wavelet  matrix  series  coefficients  was  quantized 
but,  in  the  interest  of  simplifying  this  discussion,  the  quantization  algorithm 
is  a  simplified  version  of  one  that  would  be  used  to  obtain  the  best  quality. 

Each  eye  image  was  equalized  to  provide  better  contrast  for  printing. 
The  laser  printer  process  used  to  print  this  volume  employs  dithering  to 
create  the  gray  tones  displayed  in  the  figures.  The  dithering  process  tends 
to  interpolate  and  “low  pass  filter”  the  original  digital  image.  Nevertheless, 
the  dithered  displays  provide  a  useful  indication  of  the  systematic  effects 
of  using  wavelet  matrices  of  varing  rank,  genus,  and  characteristic  Haar 
wavelet  matrix  for  image  compression. 

Figures  6-15  through  6-18  demonstrate  the  importance  of  overlapping 
basis  fynctions  and  Daubechies  wavelet  matrices  for  capturing  the  polyno¬ 
mial  smoothness  inherent  in  the  source  image  and  reducing  blocking  arti¬ 
facts  in  the  decompressed  image.  As  the  genus  g  increases,  the  blocking 
artifacts  are  decreased,  but  so  too  is  fruthful  representation  of  edge  infor¬ 
mation  because  the  larger  support  of  the  scaling  function  is  less  sensitive 
to  localized  detail.  The  optimum  combination  of  polynomial  smoothness 
and  sensitivity  to  edge  detail  appears  to  occur  for  p  =  3  or  y  =  4. 
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Figure  6-14:  Letina  eye  source  image. 


Figure  6-15:  Lenna  eye  decompressed  at  32-to-l  via  D1  (m  =  2,jf  =  1). 
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Figure  6-16:  Lenna  eye  decompressed  at  32-to-l  using  D2  (m  =  2,g  =  2). 


Figure  6-17:  Lenna  eye  decompressed  at  32-to-l  usmg  i?3  (m  =  2,ff  =  3). 
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Figure  6-18:  Lenna  eye  decompressed  at  32-to-l  using  D4{m  =  2,g  =  4). 


Figure  6-19  illustrates  the  characteristic  properties  of  an  image  com¬ 
pressed  with  the  rank  m  =  8  Discrete  Cosine  IVansform  used  in  the  ISO 
JPEG  still  image  compression  international  standard.  The  blocking  arti¬ 
facts  due  to  the  unoverlapped  transform  are  obvious  and  distracting.  The 
genus  g  =  2  “waveletized”  DCT  illustrates  the  benefit  of  employing  over¬ 
lapped  basis  functions. 

Compared  with  the  rank  m  =  2,  genus  =  1  block  transform,  the  m  =  8 
DCT  yields  a  much  more  faithful  decompressed  image  because  the  higher 
rank  is  one  way  of  increasing  spectral  resolution  to  c^ture  more  edge  in¬ 
formation.  But  the  figures  suggest  that  increasing  spectral  resolution  by 
increasing  the  genus  of  the  transfiorm  rather  than  its  rank  is  more  effective. 
This  observation  has  been  confirmed  by  extensive  studies  at  Aware,  Inc. 


164 


An  Essay  on  Wavelet  Technology  Aware,  Inc.  AD921018 


Figure  6-19:  Lenna  eye  decompressed  at  32-to-l  using  the  DOT  (m  =  8,  = 
1). 


Figure  6-20:  lenna  eye  decompressed  at  32-to-l  using  the  “waveletized’ 
DCT(m  =  8,^  =  2). 
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Figure  6-21:  Lenna  eye  decompressed  at  32-to-l  using  the  Hadamard  trans¬ 
form  (m  =  8,ff  =  1). 


Figure  6-22;  Lenna  eye  decompressed  at  32-to-l  using  the  ‘^aveletized” 
Hadamard  transform  {m  =  8,g  =  2). 
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Having  seen  so  many  compressed  examples  of  Lenna’s  eye,  let  us  see 
what  happens  when  the  entire  NITF6  Ltnna  image  is  compressed  using  a 
“production  code”  multilevel  transform,  full  quantizer,  and  entropy  coder. 
The  original  NITF6  image  is  displayed  in  figure  6-23.^  For  comparison, 
figure  6-24  is  a  32-to-l  compressed  image  employing  the  D3  wavelet  trans¬ 
form. 

6.4  Multiresolution  Audio  Compression 

6.4.1  Audio  Signals 

It  is  much  the  same  for  sound.  The  ear  selectively  omits  the  less  important 
information,  whidi  is  usually  most  of  an  audio  signal.  Only  a  fraction  of 
the  audio  information  that  activates  the  eardrum  actually  reaches  the  brain 
to  be  “heard.”  If,  for  instance,  two  tones  are  close  in  pitch,  then  the  brain 
will  hear  the  louder  of  the  two  but  not  the  softer.  The  designer  of  an  audio 
compression  algorithm  will  take  the  auditory  consequences  of  this  psychoa¬ 
coustic  “masking  threshhold”  this  into  account.  Figure  6-25  schematically 
illustrates  the  masking  threshold  for  human  hearing.  A  good  compression 
algorithm  will  selectively  omit  the  same  kind  of  information  that  the  hu¬ 
man  sensory  system  selectively  omits  to  achieve  higher  compression  levels 
without  compromising  quality;  cp.  [57],  [58],  [59]. 

Figure  6-26  displays  the  energy  density  as  a  function  of  time  and  fre¬ 
quency  for  an  audio  signal.  Time  progresses  along  the  axis  from  the  lower 
center  toward  the  upper  right,  and  frequency  increases  Unearly  along  the 
axis  from  the  upper  left  to  the  lower  center  in  the  graph.  The  next  figure 
shows  the  same  information  but  the  effect  of  the  psychoacoustic  masking 
threshhold  on  audibility  is  indicated  by  printing  the  masked  signal  energy  in 
light  gray.  Comparison  of  the  figures  shows  that  a  relatively  large  fraction 
of  the  energy  in  an  audio  signal  cannot  be  heard  by  the  ear. 

Designing  a  compression  alghorithm  for  audio  that  takes  masking  into 
account  is  thus  a  problem  in  nonlinear  filter  design. 


*The  original  U  an  8-bit  per  pixel  gr^r  icale  image,  whid>  cannot  be  adequately 
reproduced  by  the  laaer  printer,  llie  image  ihown  here  k  a  dithered  grayacale  expreeaed 
by  binary  (Mack  and  white)  pixela. 
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Figure  6-23:  Original  NITF6  I.enna  image. 
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Figure  6-24:  Lenna  decompressed  at  32-to-l  using  the  DZ  wavelet  transform 
with  quantization  {m  =  2,g  =  3). 
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Figure  6-25:  Schematic  representation  of  the  audio  masking  threshhold 
function. 
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Figure  6-27:  Psychoacoustic  map  of  the  same  audio  signal  showing  effect 
of  the  masking  threshhold. 
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Chapter  7 


Channel  Coding 

7.1  Relation  to  Shannon's  Channel  Coding 
Theorem 

Claude  Shannon,  who  founded  information  theory,  proved  that  it  is  possible 
to  add  redundancy  to  the  information  so  that  a  message  can  be  received 
error-free  and  yet  use  all  of  the  channel’s  capacity.  The  idea  is  simple  al¬ 
though  the  conclusion  is  amazing.  If  the  information  in  a  message  could 
be  spread  throught  the  entire  transmitted  signal,  like  a  hologram  each  of 
whose  parts  contains  information  about  the  whole,  then  noise  bursts  might 
corrupt  some  parts  of  the  transmission  but  would  be  correctly  received,  and 
the  whole  message  could  be  unraveled  from  the  received  parts.  This  is  the 
intuitive  idea  behind  Shannon’s  channel  coding  theorem.  But  the  chan¬ 
nel  coding  theorem  doesn’t  show  how  to  construct  a  good  channel  code 
“hologram.”  The  problem  has  two  parts;  (i)  Find  a  way  to  package  some 
redundancy  along  with  signal  so  that  the  redundant  information  spreads 
the  signal  throughout  the  entire  message;  and  (ii)  The  fraction  of  redun¬ 
dant  information  added  to  the  message  becomes  negligible  as  the  messages 
becomes  arbitrarily  long.  Then  all  of  the  chuinel  capacity  can  be  used 
without  penalty. 

Figure  7-1  illustrates  the  relationship  between  the  noise  energy  per  bit 
No,  the  signal  energy  per  bit  Ei,  the  bandwidth  in  Hertz  of  the  channel 
IV,  and  the  channel  capacity  in  bits  C  fc«  a  binary  symmetric  channel. 
Error-free  conununications  are  possible  for  every  point  that  lies  below  the 
curve. 

A  channel  coding  algorithm  attempts  to  provide  a  ccmstructive  and 
practical  realization  of  Shannon’s  Channel  Coding  Theorem,  which  loosely 
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Figure  7-1:  Shannon’s  channel  coding  theorem. 


CtaniMl  CiShts  Hmmmii 


asserts  that  arbitrarily  rehabk  communication  is  possible  at  any  rate  below 
channel  capacity;  [37,  53,  54].* 

Although  Shannon’s  theorem  is  not  constructive,  it  does  prescribe  an 
unrealizable  procedure  that  would  code  or  “modulate”  the  message  bit- 
stream  onto  the  signal  carrier  so  as  to  completely  utilize  channel  capacity. 
This  procedure  empl(^s  modulating  codewords  that  are  overlapped  random 
coding  sequences  or  “waveforms”  of  infinite  duration.  If  distinct  codewords 
represent  the  message  symbols  “0”  and  “1”,  then  the  statistical  orthogo¬ 
nality  of  random  waveforms  insures  that  cross-correlations  are  zero  and  the 
autocorrelation  of  each  random  waveform  is  the  delta  distribution.  Hence, 
despite  the  presence  of  channel  noise,  the  received  message  can  be  decoded 
by  passing  it  through  a  set  of  filters  matched  to  the  rand(»n  waveforms. 

The  key  factors  that  make  this  procedure  work  are: 

1.  Codewords  that  are  pairwise  orthogonal; 

2.  Codewords  that  are  infinitely  long; 

*The  Channel  Coding  Theorem  was  fi»t  proved  for  a  Binary  Symmetric  ChanneL  It 
hae  been  extended  to  more  general  rhannele. 
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3.  Explicitly  known  matched  filters  for  detecting  the  codewords. 

•  Orthogonality  is  used  by  the  matched  filter  detectors  to  distinguish 
the  different  codewords  from  each  other  and  from  channel  noise,  with 
which  (under  suitable  assumptions  about  the  channel  noise  statistics) 
the  codewords  have  small  or  vanishing  correlations. 

•  Infinite  length  codewords  are  employed  to  spread  the  infwmation  rep¬ 
resented  by  the  encoded  bit  throughout  the  entire  signal,  thereby  cre¬ 
ating  a  kind  of  “hologram”  whose  smallest  parts  contain  information 
about  every  part.  This  “de-localization”  of  the  information  makes 
the  codeword  resistant  to  localized  noise  effects  and,  in  the  limit,  to 
arbitrary  noise  effects. 

That  the  codewords  are  random  waveforms  is  not  essential;  random¬ 
ness,  combined  with  infinite  duration,  simply  guarantees  orthogonality  and 
delocalization. 

The  matched  filters  can  be  statistical  or  deterministic;  deterministic 
filters  are  better  in  combination  with  soft  decision  detection  procedures. 

It  appears  that  in  the  limit  of  arbitrarily  long  waveforms,  wavelet  chan¬ 
nel  coding  realizes  the  upper  bound  for  information  transmission  given  by 
Shannon's  Channel  Coding  Theorem.  That  is,  wavelet  channel  encoded  in¬ 
formation  can  be  transmitted  at  a  rate  that  is  as  close  as  one  pleases  to 
channel  capacity  with  aribitrarily  small  bit  error  rates.  Ftom  a  practical 
standpoint,  the  significance  of  this  statement  is  that  wavelet  channel  en¬ 
coding  provides  a  constructive  and  practical  soluticm  to  design  of  efficient 
codes.  Moreover,  a  wavelet  channel  codec  can  be  efficiently  implemented 
in  VLSI  technology. 


7.2  Wavelet  Channel  Coding 

The  user  of  a  digital  communication  system  will  primarily  be  concerned 
with  the  bit  rate  and  the  error  rate  of  the  communications  channel.^  The 
channel  is  controlled  by  allocating  power  and  bandwidth,  both  of  which  are 
precious  resources. 

Every  communications  channel  experiences  noise.  Additive  white  Gaus¬ 
sian  noise  (abbreviated  “AWGN”)  is  often  a  good  first  approximation  be¬ 
cause  it  results  from  thermal  effects,  which  are  always  present.  Other  types 
of  noise  are  important  in  specific  applications.  The  communications  system 
designer  would  like  to  achieve  the  objectives  of  high  bit  rate  and  low  error 


^The  bit  rate  U  the  meMage  throu^put  in  bit*  per  second.  The  error  rate  is  the 
fraction  of  input  message  bits  that  are  in  error  at  the  receiver. 
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rate  with  a  miniinuin  of  power  consumption  and  bandwidth  requirements. 
These  objectives  are  inconsistent;  trade-oib  must  be  made  that  prefer  high 
bit  rate  to  low  error  rate  or  low  error  rate  to  high  bit  rate;  low  power  re¬ 
quirements  to  low  bandwidth  or  low  bandwidth  to  low  power  requirements. 
These  trade-offs  determine  the  type  of  modulation  and  error  correction  that 
will  be  selected. 

Wavelet  channel  coding  (“WCC”)  provides  a  systematic  conceptual  and 
design  method  for  making  these  trade-ofls  through  software  selection  within 
a  common  chipset-based  hardware  framework. 

The  advantages  of  wavelet  channel  coding  result  from  the  combination 
of  the  exact  orthogonality  of  wavelet  codewords,  the  spreading  of  message 
symbol  information  throughout  the  overliq>ped  codeword  symbols,  and  the 
ability  to  utilize  soft  decisions  to  decode  wavelet  channel  coded  symbok. 

The  main  properties  of  WCC  coding  are  summarized  in  the  following 
list: 

1.  Wavelet  Channel  Coding  is  a  new  class  of  channel  coding  algorithms. 

2.  WCC  offers  a  systematic  and  simple  approach  to  the  full  range  of 
coding  problems,  from  low  rate,  power  efficient  codes  to  high  rate, 
bandwidth  efficient  codes. 

3.  Wavelet  Channel  Coding  can  be  ^plied  in  a  Hock  coding  mode  or  in 
a  tnllis  coding  mode. 

4.  WCC  codewords  are  strictly  orthogonal. 

5.  WCC  codewords  have  small  correlations  for  non-codeword  offsets. 

6.  WCC  coding  employs  robust  soft  decision  decoding. 

7.  WCC  codes  can  be  arbitrarily  long. 

8.  WCC  codewords  have  desirable  Hamming  distance  properties  and 
provide  good  error  correction  capability. 

9.  WCC  codes  can  be  efficiently  coded  and  decoded  with  simple  VLSI 
circuits. 

10.  Synchronization  can  be  perfmned  by  correlation  matching. 

Analysis  and  computer  simulaticns  show  that  WCC  with  arbitrary  length 
wavelets  achieves  the  AWGN  performance  of  coherent  binary  phase  shift 
keying  (“BPSK”).  When  compared  under  a  constant  channel  rate  criterion, 
WCC  provides  coding  gains  of  approximately  10  log2(mjr-|-  l)dB.  On  pulsed 
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interference  and  flat  fading  channels  WCC  outperforms  BPSK  with  gains 
that  are  dependent  of  the  wavelet  sequence  length. 

Figures  7-2  through  ??  display  the  performance  improvements  of  wavelet 
channel  coding  in  three  environments; 

1.  An  AWGN  channel  with  5%  burst  noise  compared  with  Pulse  Ampli¬ 
tude  Modulation  (“PAM”). 

2.  An  AWGN  fading  channel,  compared  with  PAM. 

3.  A  flat  Rayleigh  channel,  compared  with  BPSK. 

7.2.1  Modulation 

There  are  three  basic  types  of  modulation.  Each  one  corresponds  to  mod¬ 
ulation  of  one  of  three  parameters:  amplitude  A,  frequency  u,  and  phase 
6.  To  clarify  this,  suppose  that  the  digital  communication  channel  employs 
plane  electromagnetic  waves  as  the  transmission  medium.  The  modulator 
in  a  digital  communications  system  maps  the  symbols  that  are  to  be  trans¬ 
mitted  to  the  amplitude,  phase  or  frequency  of  the  carrier.  The  modulator 
output  can  be  represented  as 

s(<)  :=  Re  (u(t)cJ2'/‘‘)  .  (7-1) 

This  is  a  bandpass  signal  in  which  /«  represents  carrier  frequency  and  u(t) 
the  equivalent  low  pass  message  bearing  waveform. 

The  form  of  u(t)  is  dictated  by  the  form  of  the  modulation  used.  We 
distinguish: 

1.  Amplitude  Shift  Keying:  more  commonly  called  PAM  (Pulse  Am¬ 
plitude  Modulation).  This  is  the  modulation  method  in  which  the 
message  symbols  pn  are  m^ped  to  a  discrete  set  of  amplitudes  so 
that 

u{t):=f^yng{t-nT)  (7-2) 

n=0 

2.  Frequency  Shift  Keying:  (FSK)  in  which  the  symbols  pn  are  mapped 
to  a  discrete  set  of  frequencies  so  that 

„(0  :=f;e»»'«'-‘y(t-nT) 

nsO 


(7-3) 
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3.  Phase  Shift  Keying:  (PSK)  in  which  the  y„s  are  mapped  to  a  discrete 
set  of  phases  so  that 

«(0:=f;e^»-tf(t-nT)  (7-4) 

n=0 

In  all  of  the  above,  the  message  symbol  rate  is  assumed  to  be  1/T  and 
g(t)  is  assumed  to  be  a  pulse  (square  or  other)  of  duration  T. 

In  the  remainder  of  this  chapter,  we  examine  the  implications  of  using 
each  of  these  modulation  techniques  for  the  transmission  of  WC  encoded 
symbols.  We  limit  our  discussion  to  multiplier  2,  flat  WCM  with  maximal 
overlap  trellis  WCC.  Extending  the  discussion  to  higher  multiplier  systems 
and  arbitrary  overlap  signalling  is  straightforward. 
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Figure  7-2:  Wavelet  channel  coding  performance  in  a  channel  with  5%  burst 
noise. 
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Figure  7-3:  Wavelet  channel  coding  performance  in  fading  channels. 
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Figure  7-4:  Wavelet  channel  coding  performance  in  flat  Rayleigh  channels. 
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7.3  The  Wavelet  Channel  Coding  Algorithm 

7.3.1  The  Basic  Idea:  Trellis  WC  Message  Coding  for 

m  =  2 

For  the  purpose  of  illustration,  let  a  message 

Xj ,  X2,  ■  .  •  ,  Xn, .  ■ 

consist  of  a  sequence  of  bits  represented  by  the  signed  units  {-fl, — 1),  and 
let 

(O0,--,O2g-l  j 
6o,...,  63,-1  / 

be  a  rank  2  wavelet  matrix  of  genus  g.  Because  the  rows  of  multiplier  2 
WCM’s  are  mutually  orthogonal  when  translated  by  steps  of  two,  we  can 
take  advantage  of  the  orthogonality  of  the  o’s  and  6*8  to  code  at  i  rate 
of  two  message  bits  per  2*  clock  pulses,  where  1  <  k  <  log3  2g.  Message 
bits  with  odd  sequence  number  will  be  coded  using  the  WCM  waveform 
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(ao, . . .  ,a3f_x)  and  message  bits  with  even  sequence  number  will  be  coded 
using  the  WCM  waveform  (6o, .  •  • 

The  coding  procedure  is  illustrated  in  Table  7.1  for  a  system  with  mozt- 
mally  overlapped  wavelet  sequences.  This  corresponds  to  the  case  where  the 
code  rate  is  one  information  bit  per  WCC  symbol  (i.e.  ib  =  1  above).  The 
sequence  of  clock  pulses  is  arranged  above  the  upper  rule  of  the  table,  and 
the  sequence  of  transmitted  symbols  pn  ia  shown  below  the  lower  rule  of  the 
table.  The  encoding  of  each  message  bit  Zn  is  shown  in  the  nth  row  below 
the  upper  rule.  The  output  code  symbol  for  odd  n  the  encoding  begins  on 
the  nth  clock  pulse  and  that  for  even  n  the  encoding  begins  on  the  (n— l)th 
clock  pulse.  is  equal  to  the  sum  of  the  nth  column  of  encoded  message 
bit  values  and  is  therefore  not  restricted  to  the  values  ±1.  The  formula  for 
computing  the  symbol  p„  to  be  transmitted  at  the  nth  clock  pulse  is 

Pn  =  {a?2i+ian-2*~l  +  *2t+2^n-2t-l }  (7-5) 

k 

Some  observations  are  in  order: 

1.  It  is  evident  that  code  rates  as  small  as  l/g  can  be  achieved  by  varying 
the  overlap  of  the  wavelet  sequences;  in  the  limit  where  the  code  rate 
is  l/g,  the  wavelet  sequences  are  non  overlapped  but  adjacent. 

2.  When  flat  WCM  are  used,  the  WC  encoded  symbols  that  are  gen¬ 
erated  by  the  trellis  WCC  algorithm  are  binomially  distributed.  We 
will  discuss  this  in  more  detail  in  section  7.3.5  below. 

7.3.2  Recovery  of  a  Trellis  WC  Coded  Message  for  m  = 
2 

The  sequence  of  message  bits  Xn  can  be  recovered  from  the  transmitted 
symbol  sequence  by  a  matched  filter  which  uses  the  orthogonality  of  the 
WCM  waveform  sequences.  For  the  illustrative  implementation  of  waveform 
symbol  generation  described  in  the  previous  subsection,  the  message  bit 
x„  =  ±1  is  identified  with  the  sign  of  the  output  of  a  filter  matched  to  the 
WCM  waveform  with  input  the  received  symbol  sequence.  To  show  this, 
suppose  that  the  receiver  has  received  the  symbols 

Pi,P2,  -..yfi,..-  - 

We  will  calculate  the  correlation  of  this  symbol  sequence  with  the  WCM 
vectors  (ao,...,03,_i)  and 

First,  consider  the  output  of  the  filter  matched  to  (uq,  . . .  ,03^.1),  i.e. 
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X^a„.3i.iy„  ,  (7-6) 

n 

where  /  is  an  arbitrary  integer.  Equation  (7-5)  implies 

=  Hn  “n-JI-l  S,  odd  +  *1+1^"-*) 


—  odd  {*«  53n  ®n-at-lfln-»+ 


(7-7) 


=  2*a/+i  1 

where  we  have  used  the  orthogonality  of  the  WCM  sequences  to  obtain  the 
last  line.  A  similar  argument  yields  the  formula  for  correlation  with  the 
sequence  (6o, . .  ..ht-i),  which  decodes  the  bits  zji  corresponding  to  even 
clock  pulses. 

More  sophisticated  WC  coding  and  decoding  techniques  have  been  de¬ 
veloped  by  Aware  and  are  currently  under  investigation. 

7.3.3  The  Simplest  Case:  Block  WC  Message  Coding 
with  m  =  2 

Suppose  that  the  message  is  periodic  with  period  2g  equal  to  the  length 
of  the  wavelet  sequences  employed  by  a  trellis  coder.  In  this  case  WCC 
performs  the  function  of  a  block  encoder,  which  we  describe  in  this  section. 
If  the  message  is  the  row  vector 


I  :=  (iQi- •  •  .»2j-i)  (7-8) 

where  Xj  €  (1,  -1},  then  the  coded  message  is  the  codeword  row  vector 


y  •■=  (yo,---, 

Vig-x) 

(7-9) 

where 

y  =  xU 

(7-10) 

and 

U  :=  iUjk), 

(7-11) 

Ui,7k  := 

1 

^®0+J*)nio<J2#; 

(7-12) 

Uj,3k■^■\  '■= 

^^0+2*)mo<U< 

(7-13) 
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Theorem.  U  is  a  unitary  matrix. 

In  the  absence  of  noise,  the  decoding  algorithm  is  simply 

*  =  yU~^  =  yU*  (7-14) 

When  a  flat  WCM  is  used  for  block  coding,  constant  power  encoded  symbols 
are  generated.  This  is  not  the  case  for  trellis  WCC,  even  when  flat  wavelet 
matrices  are  utilised. 

7.3.4  Code  Rate  and  Channel  Rate  for  Arbitrary  Over¬ 
lap  Signalling 

We  discussed  in  section  7.3.1  the  ability  to  alter  the  code  rate  of  the  wavelet 
channel  code  by  changing  the  amount  by  which  the  WCM  waveforms  over¬ 
lap.  In  general  we  see  that  arbitrary  overlap  signalling  is  capable  of  pro¬ 
ducing  variable  rate  codes.  A  maximally  overlapped  signalling  technique 
produces  a  code  rate  of  1  (i.e.  1  information  bit  per  codeword  symbol); 
anything  less  than  this  will  produce  a  lower  rate  code.  A  code  with  code 
rate  equal  1  does  not  compromise  information  throughput;  one  information 
bit  is  transmitted  per  codeword  symbol. 

In  addition  to  the  code  rate,  it  is  informative  to  define  the  channel  rate 
as  the  number  of  channel  bits  transmitted  per  clock  pulse.  In  the  maxi¬ 
mally  overlapped  signalling  case,  this  is  identical  to  the  channel  rate.  This 
relates  to  the  modulation  scheme  that  is  necessary  to  transmit  the  encoded 
symbols.  The  channel  rate  for  maximally  overlapped  wavelet  sequences  is 
given  by  the  formula 

R=\ - T^— TTv  (7-15) 

logj  {mg  -I- 1) 

By  varying  the  overlap  of  the  WCM  waveforms,  it  is  possible  to  obtain 
WC  codes  whose  channel  r^  is  arbitrary.  This  occurs  because  the  num¬ 
ber  of  values  in  the  WCC  symbol  distribution  decreases  as  the  amount  of 
overly  decreases.  Another  way  to  alter  the  channel  rate  is  by  limiting  the 
modulation  scheme  to  a  number  of  levels  less  than  the  number  of  WCC 
symbols.  This  is  essentially  equivalent  to  performing  nm^e  truncation  on 
the  WCC  symbols.  For  example,  a  maximally  overlapped  trellis  WCC  us¬ 
ing  multiplier  2,  length  8  flat  real  WCM  will  produce  WCC  symbds  with 
possible  values 

-8, -6, -4, -2, 0,2,4, 6, 8  (7-16) 

The  channel  rate  of  this  code  is  log2  9.  It  would  require  a  9-ary  modular 
tion  scheme  for  transmission.  By  truncating  the  values  ±8  fr(»n  the  range 
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of  WCC  symbols,  a  7-ary  modulation  scheme  could  be  used  reducing  the 
channel  rate  to  log]  7.  Ultimately,  the  channel  rate  will  approach  one  as  the 
number  of  WCC  symbol  values  is  reduced  to  two.  The  redundancy  inherent 
in  WC  coding  suggests  that  altering  the  channel  rate  by  range  truncation 
may  be  beneficial  and  our  simulations  of  WCC  Phase  Shift  Keying  (“PSK”) 
bear  this  out. 


7.3.5  Distribution  of  WCC  Symbols 

Monograph  Distribution 

WCC  codeword  symbok  are  not  equiprobable.  Indeed, 


Theorem.  Let  [a]  be  a  m  x  mg  flat  real  WCM.  The  mg  +  1  codeword 
symbols  for  WCC  share  the  common  denominator  m*'  where  J  and  g  are 
related  by  the  equation 

m’-'  =  g,  (7-17) 

and  their  numerators  are 

-mg, . .  .,-2k,  ...,0,...,2k,...,mg.  (7-18) 

These  codeword  symbols  are  binomially  distributed  according  to  the  dis¬ 
crete  probability  density  function  (the  notation  fw  the  probability  displays 
only  the  numerator) 

Prob  (y„  =  2fc  -  my)  =  (  )  (C.b)”*',  0  <  24  <  my  (7-19) 


where  it  is  assumed  that  the  message  bits  ±1  have  equal  probability. 
Theorem.  The  average  value  of  the  variance  of  a  codeword  symbol  y  is 
<  y’  >=  E  (24  -  my)’  (  ^  )  (0.5)*"'  =  my .  (7-20) 

ksO  '  ' 


proof  is  by 
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7.4  Wavelet  Channel  Coding  and  Digital  Mod 
ulation  Techniques 

7.4.1  Wavelet  Channel  Coding  and  M-ary  Pulse  Am¬ 
plitude  Modulation 

When  using  a  multiplier  m  =  2,  flat  real  wavelet  matrix  of  order  gm  in  a 
maximal  overlap  trellis  WCC,  the  encoded  symbols  take  on  the  values 


-mg, ...,  -2k,  ...,0,...,2k,...,mg. 


(7-21) 


M  must  equal  gm-\- 1  for  there  to  be  a  one-to-one  mapping  between  the 
WC  encoded  symbols  and  the  signals  in  the  M-PAM  constellation.  As  the 
length  of  the  wavelet  sequences  increases,  so  does  the  average  energy  of  the 
transmitted  il/-PAM  signal.  We  see,  therefore,  that  an  increase  in  wavelet 
sequence  length  requires  an  increase  in  average  transmitted  symbol  energy. 
However,  for  a  fixed  message  rate,  no  bandwidth  expansion  is  required  to 
transmit  WC  codeword  symbols  using  PAM. 


7.4.2  Wavelet  Channel  Coding  and  M-ary  Phase  Shift 
Keying 

Multiphase  signaling  is  another  means  of  transmitting  digital  data.  The 
wavelet  symbols  are  m^ped  to  phases  on  the  unit  circle.  At  the  demodu¬ 
lator  it  will  be  necessary  to  map  the  received  signal  phase  back  to  the  range 
of  the  codeword  symbol  distribution  before  processing  by  the  matched  fil¬ 
ter  wavelet  decoder.  A  phase  detector  is  therefore  used  to  produce  the  soft 
decisions  that  are  passed  to  the  WC  trellis  decoder. 

It  is  interesting  to  observe  the  distribution  of  WCC  symbols  when  they 
are  normalized  to  lie  on  a  compact  interval  as  gm  — »  oo.  The  normalized 
WCC  symbols  cluster  near  zero  as  gm  increases.  A  similar  clustering  be¬ 
havior  would  obviously  be  observed  on  the  unit  circle,  as  seen  by  mapping 
the  codeword  symbols  to  the  interval  [— z',x]  and  wr^ping  this  interval 
around  the  unit  circle. 

The  issue  of  optimally  placing  the  codeword  symbols  around  the  unit 
circle  is  not  a  trivial  one.  We  have  shown  that  it  is  optimal  to  place  the 
codeword  symbols  so  that  the  distance  between  signal  constellation  points 
is  proportional  to  the  probabUity  of  symbol  occurrence,  i.e.  a  symbd  that 
occurs  frequently  is  placed  further  from  its  adjacent  constellation  points 
than  a  symbol  that  does  not.  The  unit  circle,  however  presents  an  additional 
problem.  Wra];>-around  places  the  largest  negative  value  adjacent  to  the 
largest  positive  value  in  the  signal  constellation.  This  is  undesirable  because 
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in  the  presence  of  noise,  a  sign  reversal  the  codeword  symbol  value  is 
possible  with  catastrophic  implications  at  the  trellis  decoder.  Aware  has 
found  the  following  rules  of  thumb  to  be  successful: 

1.  truncate  the  range  of  transmitted  codeword  symbds  to  those  that 
occur  on  the  order  of  95%  of  the  time. 

2.  map  these  to  the  unit  circle  so  that  approximately  two  thirds  of  the 
entire  circle  is  used  (thus  keeping  the  most  negative  and  moat  positive 
codeword  symbols  at  a  distance  of  approximately  ^.)  Ideally  the  op¬ 
timum  distance  between  these  “end”  values  (i.e.  the  most  positive  and 
most  negative  codeword  symbols)  can  be  related  to  the  signal  to  noise 
ratio  (SNR)  at  which  the  coimnunications  system  is  operating.  If  for 
instance,  the  SNR  is  known  in  an  additive  white  Gaussian  channel, 
the  end  values  can  be  placed  on  the  unit  circle  to  obey  an  apriori 
determined  probability  of  sign  reversal. 

When  M-PSK  is  used  to  transmit  WCC  symbols,  an  increase  in  the 
wavelet  sequence  length  causes  a  decrease  in  the  distance  between  adjacent 
signals  in  the  constellation.  No  increase  in  symbol  energy  is  incurred  as  the 
length  of  the  wavelet  sequences  increases.  Also,  at  a  constant  message  rate, 
the  code  rate  1  trellis  WCC  requires  no  bandwidth  expansion  regardless  of 
the  wavelet  sequence  length.  In  the  limit,  as  the  wavelet  sequence  length 
goes  to  infinity,  the  WCC  symbols  m^  to  the  entire  unit  circle  (assum¬ 
ing  no  range  truncation  is  used.)  This,  in  a  sense,  is  equivalent  to  using 
analog  phase  modulation  for  digital  data  transmission  and  is  an  interesting 
prospect  Aware  is  currently  evaluating. 

7.4.3  Wavelet  Channel  Coding  and  M-ary  iVequency 
Shift  Keying 

The  form  of  the  message  bearing  low  pass  equivalent  signal  is  given  in  eq. 
(7-3).  The  criteria  for  selecting  the  frequency  separation  between  adjacent 
frequencies  in  the  signal  constellation  are  identical  to  those  used  in  standard 
FSK  modulation.  At  the  receiver,  the  output  from  a  set  of  gm+ 1  bandpass 
filters  (each  tuned  to  an  appropriate  frequency)  can  be  used  to  obtain  a  soft 
decision  on  the  WCC  symbol.  Other,  nnore  sophisticated  demodulation 
schemes  are  possible. 

When  FSK  is  used  for  the  transmission  of  WCC  s3nnbols  and  a  constant 
message  rate  is  assumed,  an  increase  in  bandwidth  is  required  for  each 
increase  in  wavelet  sequence  length.  The  average  energy  of  the  transmitted 
wavelet  symbd  does  not,  however,  increase  with  sequence  length;  it  is  given 
by  the  energy  of  the  pulse  g(i). 
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7.5  Performance  of  Wavelet  Channel  Cod¬ 
ing 

7.5.1  Introduction 

The  performance  of  the  multiplier  2  WCC  algorithm  presented  in  section 

7.3.1  for  a  number  of  wavelet  sequence  lengths,  and  in  communications 
channel  environments  including  additive  white  Gaussian  noise,  pulsed  in¬ 
terference  and  Rayleigh  fading.  The  performance  was  compared  with  clas¬ 
sical  digital  communications  techniques  for  a  constant  code  rate  constraint 
as  well  as  a  constant  channel  rate  constraint. 

7.5.2  Performance  of  WCC  in  Additive  White  Gaus¬ 
sian  Noise 

In  this  section  we  discuss  the  theoretical  performance  of  maximal  over¬ 
ly  trellis  WC  coding  when  used  in  coqjunction  with  jl/-PAM  (heretofore 
denoted  as  WCC-PAM)  in  the  presence  of  additive  white  Gaussian  noise 
(AWGN). 

Wavelet  channel  coding  maps  information  bits  into  coded  symbols  that 
are  (mg  -f  l)-ary  for  an  order  mg  WCM.  The  probability  density  function 
is  binomial,  and  the  average  value  of  a  codeword  symbol  y’  is  given  by 
eq(7-20).  For  PAM  the  amplitudes  of  the  equivalent  low-pass  waveforms 
are  the  symbols  y,-.  The  energy  of  this  waveform  is  proportional  to  yf ,  so 
the  average  signal  energy  is  proportional  to  the  average  value  of  yf,  i.e.  my; 
cp.  eq(7-20). 

If  we  let  Zi  =  y,  +  Hi  be  a  noise-cwrupted  symbol,  then  y^  represents 
the  signal  component  (i.e.  a  WCC  symbol)  and  ni  a  realization  of  mean  0, 
variance  No/2  AWGN.  We  also  let  Si  and  fZ,  represent  the  signal  and  noise 
components  at  the  output  of  the  wavelet  matched  filter,  respectively. 

The  AWGN  noise  term  in  the  Ri  term  has  mean  value  zero  and  variance 

Var  =  rng^  .  (7-22) 

It  follows  (using  notation  similar  to  that  in  section  7.3.2  )  that  the  proba^ 
bility  of  a  bit  error  is  (cp.  [61]) 

A  =  2  (Nv+i  +  5*1+1  >  0  I  *ai+i  =  — 1) 

-bProb(A2j+i  -t-  521+1  <  0 1  lai+i  =  +1)} 

=  2  (^w+i  >  I  *ai+i  =  —1) 

■fProb(iZ2»+i  >  N  I  *21+1  =  +1)}  . 


(7-23) 
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Since  the  noise  term  Jlsf+i  is  independent  of  this  can  be  written  as 
A  =  P«b(«„..>X)  =  0(^)=p(y^  (7-24) 

where 

(7-25) 

Performance  Results  with  Rate  1  WC  Codes 

Since  rate  1  WC  codes  maintun  the  original  bit  rate,  a  comparison  with 
classical  binary  modulation  techniques  (BPSK,  BFSK,  etc.)  is  apprc^riate. 
Let  Ei  represent  the  energy  transmitted  over  the  channel  per  message  bit. 

It  can  be  shown  that  WCC-PAM  in  AWGN  achieves  the  bit  error 
rate  (BER)  performance  of  coherent  BPSK  for  arbitrary  length  wavelet 
aequences,  and  WCC  outperforms  ccAerent  BPSK  in  channel  environments 
other  than  AWGN.  This  is  due  to  the  property  WCC  to  spread  mes¬ 
sage  bit  information  throughout  the  codeword  symbols.  As  a  result  of  this 
spreading,  we  see  improvements  in  performance  with  increased  wavelet  se¬ 
quence  lengths  in  pulsed  interference  and  fading  channels. 

Performance  Results  with  Constant  Channel  Rate 

We  now  compare  WCC-PAM  to  uncoded  PAM  under  a  constant  channel 
rate  criterion.  In  order  to  maintain  the  message  bit  rate  and  since  Iog2(mjr-f 
1)  bits  are  transmitted  per  WCC  codeword  symbol,  it  is  necessary  to  assume 
that  a  bandwidth  expansion  factor  of  log^{mg  -1-  1)  is  allowable  on  the 
channel  without  significant  intersymbol  interference. 

Let  Ec  be  the  energy  per  channel  bit.  The  BER  performance  is  ex¬ 
pressed  in  terms  of  E^  as 


Pi  =  Q 


(2EJog2(mg 
No 


(7-26) 


The  simulated  performance  of  WCC-PAM  was  compared  to  9-PAM  and 
33-PAM.  used  for  coded  and  uncoded  PAM.  As  seen  from  the  simulation 
curves,  mg-WCC  achieves  gains  of  approximately  101og2(m9  l)dB  over 
(mg-hl)-aTy  PAM.  Note  that  these  curves  are  plotted  b  terms  of  channel  bit 
SNR:  Ee/No.  For  PAM  this  is  equivalent  to  information  bit  SNR  Es/Nq. 
For  WCC-PAM  Ee/No  is  equivalent  to  Et/No  only  with  the  bandwidth 
expansion  assumption  given  above. 


190 


An  ESssay  on  Wavelet  Technology  Aware,  Inc.  AD921018 


7.5.3  Performance  on  Pulsed  Interference  Channels 

The  pulsed  interference  (also  called  burst  noise)  channel  is  an  AWGN  chan¬ 
nel  that  is  occasionally  corrupted  by  large  bursts  of  noise  or  interference 
(either  intentional  or  unintentional).  The  inherent  property  of  WC  codes 
to  spread  a  message  bit  over  a  kn^h  mg  WC  codewOTd  suggests  that  the 
use  of  WCC  in  burst  noise  channels  would  be  beneficial.  Indeed,  as  we 
show  in  this  section,  the  effects  of  pulsed  interference  are  ameliorated  by 
using  WCC,  with  perfwmance  improvements  that  increase  with  increasing 
wavelet  sequence  length. 

Performance  Results 

The  performance  of  WCC-PAM  in  pulsed  interference  was  derived  using 
computer  simulation.  It  is  assumed  that  the  noise  bursts  occur  randomly 
and  are  of  duration  no  longer  than  that  of  a  symbol.  If  an  interleaver  with 
length  at  least  that  of  the  burst  is  used,  the  results  apply  to  longer  duration 
bursts  as  well.  Symbols  affected  by  pulsed  interference  are  simply  blanked 
out  at  the  receiver  causing  a  symbol  erasure.  The  justification  for  this  is 
that  all  information  in  the  amplitude  of  the  PAM  signal  has  been  destroyed 
by  the  pulsed  interference,  hence  setting  the  affected  symbol  to  sero  (the 
value  of  the  most  probable  codeword  symbol)  is  a  reasonable  choice. 

It  is  apparent  that  WCC  provides  significant  gains  over  uncoded  BPSK 
in  the  presence  of  pulsed  interference  and  that  these  gains  grow  with  in¬ 
creases  in  the  wavelet  sequence  length. 
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Table  7  ’  F  .ample  of  Wavelet  Channel  Coding 


1 

2 

4 

...  2j 

2®  + 1 

2g  +  2 

• 

*1«0 

xjbo 

Xiai 

xj6i 

xiaj 

xjij 

xsao 

Xibo 

Xjas 

xj6j 

xsai 

x«6j 

. . .  ztajg^i 
. . .  *ih2g~i 
. . .  xjasf-s 
...  X462«-3 

xaa3f-2 

X9a2f-1 

X*hig-l 

Vi 

Vi 

Vi 

>4 

...  ira. 

Vtt+i 
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Chapter  8 

Numerical  Solution  of 
Partial  Differential 
Equations 


8.1  Understanding  PDE 

The  motion  and  interaction  of  physical  things  on  the  scale  of  human  beings 
is  described  by  Newton’s  laws,  which  mediate  between  the  submicroscopic 
world,  governed  by  quantum  mechanics,  and  gravity,  governed  by  general 
relativity.  For  the  most  part,  these  laws  and  relationships  are  expressed 
in  terms  of  relations  between  physical  quantities,  such  as  force  and  energy, 
and  the  rate  at  which  they  change  with  time  and  distance.  The  equations 
that  express  the  relations  between  a  quantity  and  its  rate  of  change  are 
differential  equations. 

The  differential  equations  that  govern  the  diffusion  of  heat  through¬ 
out  an  object  (the  “diffusion  equation”),  the  propagation  of  electrical  cur¬ 
rents  and  radio  waves  (“Maxwell’s  equations”),  the  propagation  of  of  sound 
waves  (the  “wave  equation”),  the  structural  behavior  of  of  materials  (the 
“continuum  mechanics  equation”),  and  the  flow  of  viscous  fluids,  ranging 
from  air  to  water  and  honey  (the  “Navier-Stdces  equations”)  are  the  basic 
equations  that  govern  the  phenomena  of  daily  life.  Engineers  who  design 
complex  products  usually  use  numerical  solutions  of  one  or  more  differential 
equations  as  a  tool. 

The  solutions  of  differential  equations  also  relate  the  values  of  paran>- 
eters  that  can  be  controlled  by  the  user  to  the  behavior  of  a  process  or 
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system,  just  as  a  driver  rotates  the  steering  wheel  to  control  a  car’s  direc¬ 
tion. 

It  is  usually  difficult  to  calculate  numerical  solutions  for  differential 
equations  in  real  time;  in  some  cases,  because  of  the  large  amount  of  calcu¬ 
lation  that  would  be  needed  or  because  the  equation  describes  a  transient 
or  nonlinear  phenomenon  like  a  shock  wave  and  is  sensitive  to  numerical 
errors  due  to  round-off  or  discretization  of  the  space  and  time  variables,  it  is 
impossible.  For  these  reasons,  real-time  process  control  has  not  been  prac¬ 
tical  for  most  processes.  Recent  advances  in  microcomputer  performance 
and  in  wavelet  mathematics  create  the  possibility  of  broadly  applying  the 
numerical  solution  of  differential  equations  to  real-time  control  problems. 


8.2  Wavelet  numerical  solutions  for  differ¬ 
ential  equations 

Aware  has  developed  numerical  methods  for  solving  nonlinear  partial  differ¬ 
ential  equations  using  wavelet  functions.  The  relevant  properties  of  wavelets 
are  that  they  are  compactly  supported,  orthogonal  and  define  bases  for 
Sobolev  spaces.  Being  both  compactly  supported  and  orthogonal,  wavelets 
combine  the  advantages  of  finite  element,  spline,  and  Fourier  spectral  meth¬ 
ods  that  have  been  traditionally  used  to  find  numerical  solutions.  To  test 
the  feasibility  of  the  wavelet  methods  we  have  studied  nonlinear  phenomena 
that  exhibit  singular  behavior.  These  include  shocks  (e.g.  Burgers  equa¬ 
tion,  the  Gas  dynamics/compressible  Navier  Stokes  equations)  and  turbu¬ 
lence  (incompressible  Navier  Stokes).  For  shocks  we  found  some  remark¬ 
able  results.  The  Galerkin  method  based  on  wavelets  (the  Wavelet-Galerkin 
method)  accurately  captured  shocks  for  large  Reynolds  number  and  stably 
described  their  interactions.  The  solutions  decayed  to  the  correct  limit  for 
large  times.  In  comparison,  the  spectral  and  finite  difference  solutions  were 
unstable  and  inaccurate.  The  stad>ility  of  the  Wavelet-Galerkin  method  ex¬ 
tended  to  infinite  Reynolds  number  and  by  a  simple  modification  of  the 
cutoff  boundary  condition  accurately  c^q>tured  shocks  in  the  inviscid  limit 
[30],  [27].  For  turbulence  we  ran  simulations  of  the  two  dimensional  Euler 
flow  and  examined  the  phenomena  related  to  vortex  merger,  filamentation, 
the  gradients  of  vorticity  and  the  long  time  limit.  Our  initial  results  show 
that  the  Wavelet-Galerkin  method  is  as  accurate  as  the  fuUy-dealiased  spec¬ 
tral  method,  is  far  more  stable,  and,  for  each  time  step,  (depending  on  the 
specific  wavelet  basis,  algorithm  and  state  of  optimization),  can  be  faster 
than  the  spectral  method  [68].  It  seems  that  the  wavelet  discretization  al¬ 
lows  the  implementation  of  numerical  schemes  for  Euler  flow  that  are  useful 
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for  long  time  studies  and  that  cannot  be  implemented  by  spectral  or  finite 
difference  methods.  The  basic  numerical  lesson  is  that  scaling  function  ex¬ 
pansions  subdue  the  Gibbs’s  phenomenon,  allow  the  stable  calculation  with 
oscillatory  approximations  that  occur  in  inviscid  calculations,  and  allow  the 
efficient,  iterative  implementation  of  implicit  time  differencing  methods. 

Compact-support  and  the  ability  of  scaling  functions  to  exactly  i^prox- 
imate  features  smoother  than  themselves,  and  hence  to  be  able  to  better 
approximate  discontinuous  features,  is  essential.  The  orthogonality  of  the 
translated  scaling  function  basis  also  guarantees  the  conservative  approxi¬ 
mation  of  the  quadratic  integrals  (energy  and  enstrophy)  by  the  Galerkin 
approximation  to  the  Jacobian  nonlinearity.  The  implicit  time  differencing 
with  the  Wavelet-Galerkin  space  discretization  allows  the  control  of,  i.e. 
conserves,  or  monotonically  increases  or  decreases,  the  energy  and/or  en¬ 
strophy.  It  seems  clear  that  wavelet  based  numerical  methods  for  turbulence 
simulations  hold  considerable  promise  and  warrant  further  development. 

We  have  also  developed  a  wavelet-based  method  for  the  solution  of 
boundary  value  problems  in  arbitrary  geometries.  This  Wavelet-Capacitance 
Method  is  defined  by  a  nontrivial  extension  of  the  classical  C^acitance 
Matrix,  and,  unlike  the  classical  method,  it  can  be  spectrally  accurate  [43]. 
We  use  compactly  supported  wavelets  as  a  Galerkin  basis  and  develop  a 
Wavelet-Capacitance  Matrix  method  to  handle  boundary  geometry.  For 
several  geometries  we  have  made  a  detailed  compairison  of  methods,  ex¬ 
amining  accuracy  and  rates  of  convergence.  Specifically,  we  compared  the 
Wavelet-Galerkin  method  to  standard  methods  for  the  numerical  solution 
of  the  Biharmonic  Helmholtz  equation  and  the  Reduced  Wave  equation 
in  nonseparable,  two-dimensional  geometries.  We  have  used  the  Wavelet- 
Capacitance  method  to  numerically  resolve  the  long  term  limit  of  two- 
dimensional  Euler  and  Navier-Stokes  flows  in  rectangular  and  L-sh^ed 
domains.  This  has  considerable  relevance  to  the  qualitative  behavior  of 
two-dimensional  turbulence  [70],  [68],  [69]  and  to  practical  applications  of 
the  method  to  problems  in  engineering.  We  have  developed  least-squares 
versions  of  our  algorithm  for  the  Helmholtz  equation  in  nonseparable  ge¬ 
ometries  and  examined  the  accuracy  and  convergence  of  these  methods. 

We  have  also  used  the  Hilbert  transform  of  the  compactly  supported 
wavelets  to  define  an  extended  basis  for  the  exterior  Acoustic  Helmholtz 
problem.  Surprisingly,  the  Hilbert  transform  of  a  compact  wavelet  is  a 
noncompact  wavelet  verifying  the  same  basic  equation.  The  noncompact 
form  can  be  used  to  match  the  Sommerfeld  radiation  condition.  In  sum¬ 
mary, 

•  The  Wavelet-Capacitance  Matrix  Method  is  stable  and  spectrally  ac¬ 
curate.  These  results  apply  to  general  nonseparable  d(»nains  and  all 
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ranges  of  the  parameters. 

•  The  Wavelet  algorithm  is  found  to  obtain  accurate  results  for  prob¬ 
lems  where,  for  instance,  finite  difference  methods  do  not  converge, 
or  converge  slowly,  and  where  Fourier  Spectral  methods  do  not  ^ply. 

•  For  a  fixed  level  of  discretization,  increasing  the  order  of  the  wavelet 
basis  spectrally  decreases  the  error. 

•  The  rates  of  convergence  in  sup  norm  appear  to  depend  on  the  wavelet 
basis,  DN,  and  discretization,  6x,  as 

•  The  rates  of  convergence  in  sup  norm  appear  to  be  independent  of 
the  domain  shape. 

•  Least-squares  versions  of  the  wavelet  algorithm  can  preserve  accu¬ 
racy  and  decrease  the  computation  by  more  than  a  factor  of  four. 
The  finite  difference  algorithms  would  not  allow  effective  least-square 
implementations . 

•  All  errors  (accuracy  and  convergence)  are  measured  in  the  pointwise 
sup  norm. 

•  Our  implementation  is  fast,  since  it  is  based  on  fast  (FFT)  evaluations 
for  periodic  geometry  adapted  to  nonseparable  geometry. 

•  The  basic  algorithm  ^>plies  to  one,  two  and  three  space  dimensions, 
without  essential  modification. 

•  The  Wavelet-Galerkin  method  preserves  symmetries  of  the  domain 
and  defines  a  type  of  discrete  orthogonality  that  is  very  useful  for 
further  applications  of  the  method.  These  applications  include  fast 
Domain  Decomposition  techniques. 

•  We  have  found  a  wavelet  basis  with  prescribed  rate  of  decay  at  infin¬ 
ity.  This  is  based  on  the  Hilbert  IVansform  of  compactly  supported 
wavelets  and  will  allow  the  fast  direct  solution  of  the  exterior  acoustic 
problem. 

•  To  our  ki  ...  jge,  our  algorithm  is  an  unique  extension  of  the  classi¬ 
cal  Capacitance-Matrix  method  and  should  have  several  and  diverse 
applications  to  problems  requiring  a  higher  order  accuracy. 

of  the  orthogonal  spaces  is  useful  in  the  numerical  solution  of  n<Hilinear 
systems. 
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We  have  found  that  the  Hilbert  TVansform  of  a  scaling  function  is  a 
scaling  function  with  a  prescribed  rate  of  decay  at  infinity.  TVanslates  of 
this  scaling  function  define  a  basis  on  the  real  line  that  is  useful  for  the 
numerical  solution  of  exterior  boundary  value  problems.  For  instance,  this 
would  allow  a  direct  solution  of  the  acoustic  radiation  problem. 


8.3  Summary  of  Current  Results 

8.3.1  Advantages  of  wavelets  for  the  numerical  Solu¬ 
tion  of  PDE. 

•  TVanslatesform  orthogonal  basis.  Quadratic  invariants,  including  en¬ 
ergy  and  enstrophy,  are  preserved. 

•  Locally  exact  representation  of  polynomials  by  '  r  ^nslates  of  the  scal¬ 
ing  function. 

•  Compact  support  of  basis  elements.  Localization  of  gradients  and 
stabiUty  of  numerical  methods.  Attenuation  of  Gibb’s  Phenomena. 
EflBcient  parallel  implementations. 

•  The  degree  of  approximation  of  basis  elements  is  greater  than  their 
degree  of  smoothness.  Good  approximation  of  smooth  functions.  At¬ 
tenuation  of  Gibb’s  Phenomena  at  discontinuities. 

•  Multiresolution  of  scales.  Connections  with  multigrid  and  adaptive 
methods. 

•  Fast  evaluation  of  basis  elements  by  use  of  the  scaling  relation. 

•  Exact  evaluation  of  wavelet  functionals  by  algebraic  methods.  Exact 
evaluation  of  functionals  required  in  the  Galerkin  method. 

8.3.2  Principal  Results 

•  Development  of  Wavelet-Galerkin  method.  Exact  evaluation  of  Con¬ 
nection  Coefficients.  Shock  Capture  and  T\irbulence  Closure. 

•  Development  of  wavelet  methods  for  PDE  boundary  problems  by  an 
extension  of  the  Capacitance  Matrix  method.  Comparison  with  stan¬ 
dard  finite  difference  methods.  Spectral  ctmvergence  of  the  wavelet 
methods. 

•  Long  time  limit  of  2D-Euler  and  Navier-Stdces  flows  with  geometry. 
Implications  for  Two-Dimensional  Turbulence. 
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8.4  A  One  Dimensional  Example:  Burgers 
Equation 

Burgers  equation  is  an  idealized  mathematical  representation  for  the  devel¬ 
opment  of  a  "breaker”  as  an  ocean  wave  feels  the  bottom  when  it  reaches 
a  shallow  beach.  The  bottom  of  the  wave  is  retarded  and  the  slope  of  the 
wavefront  increases  until  it  is  nearly  vertical.  Then  the  wave  breaks  and 
crashes  on  the  beach. 

Aware  developed  the  Wavelet-Galerkin  method  to  solve  this  kind  of 
problem.  The  results  have  been  reported  m  several  p^ers  (cp.  [27],  [32], 
[66])  to  which  the  reader  is  referred  for  the  details. 

Burgers  equation  is  perhaps  the  simplest  example  of  a  non  linear  partial 
differential  equation.  The  equation  is 

Ut  +  uu,  =  auxx, 

where  a  is  the  viscosity.  When  the  viscosity  is  null,  the  field  will  develop 
a  shock  in  a  time  t*  =:  min{l/uc(z,0)}.  The  nonlinearity  is  the  quadratic 
term,  uu^,  which  causes  small  irregularities  in  u{x,t)  to  grow  as  time  passes. 
Ultimately,  the  gradient  of  u  becomes  too  large  to  be  accurately  estimated 
by  a  difference  quotient  on  a  grid  of  fixed  size,  and  the  numerical  solution 
becomes  unstable  and  "crashes  on  the  beach”  too.  Conventional  numerical 
methods  for  solving  this  equation  are  unable  to  cope  with  the  consequences 
of  the  instability  of  the  numerical  derivative.  Wavelet  methods,  based  on 
the  use  of  connection  coeffiicents  to  estimate  derivatives,  is  much  less  sen¬ 
sitive  to  the  grid  size  and  tempers  the  instability.  The  result  is  an  accurate 
solution  without  requiring  fine  grids  or  increased  computational  cost. 

An  example  will  help  to  demonstrate  the  power  of  the  Wavelet-Galerkin 
method.  For  the  sake  of  simplicity  of  descriptim  we  will  consider  the  Burg¬ 
ers’  equation  evolution  of  a  one  dismensional  periodic  wave;  this  is  the  same 
thing  as  considering  a  wave  that  travels  around  a  circle.  This  assumption 
separates  the  problem  of  the  general  numerical  stability  and  accuracy  of  the 
method  from  questions  that  concern  the  treatment  of  boundary  conditions.^ 


’  Aware  and  ita  aaiodatee  at  Rice  Univenity  have  developed  powerful  and  efficient 
general  methoda  for  employing  wavelet#  with  boundary  conditiona.  Cp.  [as],  [71].  143]. 
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Figure  8-1:  Initial  condition  for  the  solution  of  Burgers  equation. 
u(x,0) 
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Suppose  that  at  time  <  =  0  the  initial  waveform  is  the  sinusoid 
u(«,o)  =  —  Bin(2ir*/64) 

suitably  normalized,  and  that  a  grid  of  64  points  per  unit  interval  is  em¬ 
ployed  in  the  space  domain;  the  initial  function  is  shown  in  figure  8-1. 

Figure  8-2  illustrates  the  evoluticm  of  the  solution  employing  the  stan¬ 
dard  de-aliased  spectral  method  on  the  64  point  grid.  In  the  figure  the  time 
axis  runs  from  the  upper  right  to  the  lower  left,  so  that  the  solution  for  the 
final  time  step  is  the  leftmost  cross  section  of  the  surface,  and  the  initial  si¬ 
nusoid  is  the  rightmost  cross  section.  The  vertical  scale  measures  the  wave 
amplitude  u(x,t).  A  total  of  450  equispaced  time  steps  are  represented. 

The  figure  shows  that  relatively  soon  a  nearly  vertical  shock  wave  de¬ 
velops.  Since  the  equation  contains  a  diffusive  term,  ther  peak  value  of  the 
shock  decays  as  time  passes.  Then,  as  the  figure  illustrates,  the  shock  give 
birth  to  an  oscillation.  The  shock  amplitude  increase  and  the  frequency 
oscillation  rapidly  grows  in  amplitude  and  propagates  throughout  the  so¬ 
lution.  The  source  of  the  oscillation  is  numerical  error:  error  in  estimating 
the  gradient,  and  round-off  error  in  the  calculations.  The  final  time  slice 
shows  an  completely  unreliable  numerical  solution. 

Although  the  errors  dominate  the  solution,  their  local  space  average 
appears  to  be  less  sensitive  to  these  errors.  Figure  8-3  shows  the  result  of 
applying  a  3-point  average  to  the  numerical  solution  at  each  time  step.  This 
simple  average  provides  a  generally  accurate  picture  of  the  solution,  but  it 
is  neither  qunatitatively  correct  nor  is  it  accurate  in  the  qualitative  details 
-  the  shock  front  begins  to  grow  again  as  time  evolves.  These  problems 
are  typical  when  the  de-aliased  spectral  method  is  applied  to  non  linear 
problems  that  describe  transient,  localized  phenomena  that  have  rapidly 
varying  gradients. 

The  Wavelet-Galerkin  method  avoids  these  difficulties.  The  local  sup¬ 
port  of  the  wavelet  basis  functions  reduces  or  eliminates  propagation  of  er¬ 
rors  to  distant  regions,  and  the  high  accuracy  of  the  connection  coefficient 
derivative  estimates  (cp.[28],  [36])  reduces  the  error  in  gradient  estimates. 
When  combined,  these  features  lead  -  for  a  preset  level  of  accuracy  -  to 
lower  computational  cost  because  the  can  employ  a  coarser  grid. 

Figure  8-4  displays  the  wavelet  numerical  solution  of  the  same  prob¬ 
lem  using  the  Daubechies  wavelet  basis  D3.  The  soluticm  has  not  been 
smoothed.  Figure  8-4  shows  the  same  solution  with  3-point  snnoothing. 
This  provides  a  small  incremental  improvement  in  the  accuracy  of  the  so¬ 
lution. 
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Figure  8-2:  Evolution  of  the  numerical  solution  of  Burgers  equation  em¬ 
ploying  unsmoothed  de-aliased  spectra)  method. 
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Figure  8-3:  Evolution  of  the  numerical  solution  of  Burgers  equation  em¬ 
ploying  smoothed  de-aliased  spectral  method 
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Figure  8-4:  Evolution  of  the  numerical  solution  of  Burgers  equation  em¬ 
ploying  unsmoothed  Wavelet-Galerkin  method. 
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Figure  8-5:  Evolution  of  the  numerical  solution  of  Burgers  equation  em¬ 
ploying  smoothed  Wavelet-Galerkin  method. 
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